You basically only interact with the kernel on init/shutdown or outside of the fast path, and do something like isolcpus to delegate the kernel and interrupt handling to some garbage cores and give you the rest to do what you want with.
Also, PREEMPT_RT is the worst option for low latency because it's about execution time guarantees and not speed specifically. If you're on PREEMPT_RT and give your critical thread highest prio, be prepared for some serious OS-level lock-ups.
PREEMPT_RT includes priority inheritance, specifically to avoid this scenario. So your app should indeed be favored if you tune accordingly. What you also seem to be saying is that using PREEMPT_RT may lead to lower throughput, but that's not the same thing as latency.
Admittedly it's a bit terse, but at least it gives some steps you can use to enable it on Windows. It also benefits other software, such as 7zip. I need to update the page because these days the performance benefits are larger, due to the ever-widening divide between compute and memory speeds, CPUs having bigger large page TLBs, and additional optimisations...
What I do find fascinating is that HRT is actively blogging about this stuff. Ten years ago, everyone in the biz was super secretive and never made any public announcement about what we did - even stuff that I would take from HPE and RHEL low latency manuals (which were public knowledge). You never said anything publicly because protecting the "secret ingredients" of the trading system was paramount and any disclosure was one step towards breaking that barrier.
Now, I'm seeing HFT companies post articles like this and I'm thinking it has to be for recruiting. Why else would they do it?.
Anyway, as a side note, if you liked this article, you'd also probably like this:
http://hackingnasdaq.blogspot.com/
It was one of my favorite reads because it was written by someone going thru the journey of low latency exploration - before everything was taken over by FPGAs.
I think you’re right in the money here. All the secret sauce is in FPGA trading now so there’s nothing secret about sharing this info.
It's not just Finance but other industries have this culture as well. I suspect it manifests in an environment that is perceived to be hyper competitive--any perceived advantage regardless of where it came from or how differentiated it is, is held closely and over-weighed if proper metrics aren't in place to continue pushing for improvement.
It's for recruiting, clout, and also generally expressing the culture of the firm.
Also agreed that this is a pretty surface level optimization, theres a reason why they are talking about it. If you are doing true HFT with purely software traders, you will probably lose to more serious players using FPGAs, which as OP mentioned isn't exactly new.
https://lmax-exchange.github.io/disruptor/disruptor.html
This technical paper sent me on a multi-year journey regarding one simple question: "If this stuff is fast enough for fintech, why can't we make everything work this way?" Handling millions of requests per second on 1 thread is well beyond the required performance envelope for most public "webscale" products.
A. Capital (human and money)
B. Bespoke internal tools from the server up
C. Insane scale
Google and meta put in lots of effort to have fast C++ code for their core infra, and they have several teams contributing to the LLVM project. Even places like Figma optimize to some extent because part of their business alpha is being performant and smooth. When you are at the scale of FB or G, optimizing the small things can lead to massive aggregate gains, and they have the eng talent/time to justify it.
At smaller companies though, as others have mentioned, iteration time and efficient dev spend are paramount. Optimizing for microsecond latency with on your B2B SaaS product written in Python/React is most likely not part of your business case, and it is a waste of engineering time and effort to do this when you could be putting that time and money into new and better features. Most of the time, these very niche performance considerations are taken care of to a decent degree with off the shelf tools, maybe with a bit of tuning.
Most webscale products aren't written in performant languages and are, instead, optimized around fast feature generation and being easy (cheap) to hire for.
There's a reason laggy Electron apps are the norm $$.
I was oblivious to this a year ago before I got interested in database internals
Something that I found interesting, there's a recent presentation by Neumann about the Umbra DBMS where he fields a question about hugepages at the end. I recall him saying they don't use it, which I found interesting.
I know Oracle and MySQL recommended Transparent Hugepages IIRC
OTOH to make use of "normal" huge-pages, you have to allocate them up front so it's not possible to run into THB type of issues.
That said, I doubt that enabling huge-pages for complex database workloads, that cannot run solely in-memory, will show any noticeable performance improvement. There's a lot of IO and memory R/W involved and I think this is what shadows the TLB miss cost. What would be interesting, and what I haven't done so far, is to estimate the number of CPU cycles needed for a TLB miss.
If you're interested in consistent low latency you do need to avoid TLB misses, and also page faults, cache contention, cache coherency delay (making sure no other cores are accessing your memory) from the CC protocol (MOESI/MESI(F)) and mis-prediction, and that's after you have put all your core's threads into SCHED_FIFO. Using https://lttng.org/ can be really helpful in checking what's happening.
I don't think you'll find many articles that detail these points. Now, they might be trivial to you and that's totally fair. But the goal is to address a wide audience. Additionally, the article is not addressing how to use HPs but that's for part 2.
Wrt to other points, I certainly agree they are important topics to explore. I would add using perf is super important to easily access the perf counters
Although huge pages are pretty basic table stakes for hft software nowadays, not much alpha left to high by really going into detail on them?