“This is why Linux now provides rseq() which is a much more enlightened solution. With restartable sequences, you actually can get rid of both the mutex and atomics, while the OS continues to fully abstract scheduling. The way it works is you advise the kernel whenever your program enters a critical section of code that you don't want interrupted. It's probably going to be maybe 10 assembly instructions tops. The first assembly opcode should be a move instruction that sets the rseq_cs field. The last instruction needs to be the thing that makes the modification to your global data structure. Think of it sort of like a really tiny database transaction. What makes it go fast, is that the bidirectional communication with the kernel happens via shared memory.”
(Based on my reading of the LWN article rwmj posted).
There is a time-slice extension feature in the works that's roughly "please let me finish this critical section before you interrupt me". But a hard guarantee that userspace code won't be interrupted is probably untenable in a preemptive multitasking system.
The way I read it, it either runs to completion in one go, or gets restarted from the beginning. This means the sequence as a whole isn't executed atomically, as the already-executed instructions during an interrupt aren't rolled back.
It can be used to build atomic actions, but it is up to the developer to create a sequence of instructions where the very last instruction "commits" the entire operation, with the side-effects of partial execution being harmless.
With rseq, we can allocate in any userland process one instance of a given synchronisation data structure per each CPU. It's important to understand that userland code accessing per-cpu data structures cannot prevent being scheduled away from a CPU and being replaced by another thread (kernel code can block scheduler for short critical sections). Such a replacement thread may subsequently corrupt that same data that was still in the middle of the transaction. But we can make a subset of transactions safe at least: If a transaction gets committed in a single (final) atomic instruction, and we get kernel support for this transaction to be restarted in case there has been a schedule mid-way, this is a guarantee that at the time of commit, the entire transaction hasn't been interrupted by the scheduler. I.e. a kind of "mutual exclusion" guarantee.
Did I get that right?
And 2 more ops per rseq.
But rseq's are certainly cool.
Also, it was pre-2025 (before she got her job at Gradient Canopy), so long before asking for donations.
Finally, if you read the actual donation request, you can see she is trying to make a living doing open source, and is being honest about what the money is going to. Why is that an issue?
Your bad faith reading of that article says otherwise. It's bleedingly obvious that it's satire. Do you think people seriously ask for donations to fund a private airplane?
$20K of gear isn't that much if you're an independent developer, and if you're working for others as such in the US, and you're not a financial basket case, it's doable. She even says "It put me in the poor house for a few months" so she made sacrifices to get there. You can too, if you want to. Why the envy?
To the contrary, so you really seem to have a problem with her.
[references https://web.archive.org/web/20260529122658/https://justine.l... ]
But agreeing with you, I've done big optimization stuff for multicore servers (not as many cores, but same kind of work) and my workstation was something small with not even the same os. I don't need the big machine on my desk to understand the concepts. I just need the big machine to check my work. For me, that's always been a production machine, sometimes a production machine taken out of rotation for pre-validation before running on production load. I guess I should mention, I work on applications specifically, and libraries and kernels as it relates to making whatever my application is work better. I also don't have a problem with pinning threads to cpus... but my applications are usually one big program that fills the system. Someone writing a general purpose library has a harder time.
Of course, if you want to do this kind of work and you don't have your own production load, you're going to have to borrow, rent, or buy a big machine. It doesn't need to be your workstation though. I hate working with cloud nonsense, but if your tests are short, and you do the upfront work to make your images start fast, you can probably save a lot of money by renting spot instances when testing ... I don't know if you can do spot instances of bare metal though, so you're probably stuck with vm overhead.
But then they go on to take a Gemini job[1] so I dunno, more consolidation than consolation perhaps.
[1] TFA says “job offer” but another comment[2] says that they work there.
https://github.com/compudj/librseq
This has helpers for common use cases like counters and linked lists. You shouldn't need to write assembly at all to use rseq in most applications.
The syscall these days is invoked by libc not the program; libc provides access to some symbols that let the program execute rseqs as well.
The key insight is that the preempter can introspect the program counter of the code being preempted (which is now stable since it was preempted) and act accordingly. The simplest mechanism is to reset their program counter if in a critical section. The more generic mechanism is to jump them to a supplied address. This allows you to do things like hard abort and more.
You can further remove the need for the preempter to understand the preempted code by having the preempted code create a self-introspection code snippet and supplying that with the program counter at preemption. So the preempter just vectors them to their own code which knows how to interpret its own state at any preemption point.
Anyone with an informed opinion on this statement? It's seems counter intuitive (npi).
If you remove the 64 byte alignment (which forces each counter variable onto a separate cache line) from hitcounter-shard.c you ought to be able to see the performance difference for yourself.
The Q is: is it true the CPU mutexes are actually slower than those implemented in userspace?
... bidirectional communication with the kernel happens via shared memory.
What could possibly go wrong?I fully agree that rseq should be more easily available to Linux developers, though.
Something like
greenbean
then ftp -vvd4o/dev/stdout http://127.0.0.1:8080
This is labeled as an error: "fragmented message"Also why doesn't redbean do TLS1.3
And rusage "wall time" output seems to be wrong
Too much emphasis on Unicode for me, it's off-putting
I like consoles that do _not_ support UTF-8. At least, it should be optional