Restartable Sequences (opens in new tab)

(justine.lol)

254 pointsgrappler22d ago73 comments

73 comments

46 comments · 12 top-level

GlenTheMachine22d ago· 12 in thread

If you had no idea what a restorable sequence is the takeaway is about halfway down the OP:

“This is why Linux now provides rseq() which is a much more enlightened solution. With restartable sequences, you actually can get rid of both the mutex and atomics, while the OS continues to fully abstract scheduling. The way it works is you advise the kernel whenever your program enters a critical section of code that you don't want interrupted. It's probably going to be maybe 10 assembly instructions tops. The first assembly opcode should be a move instruction that sets the rseq_cs field. The last instruction needs to be the thing that makes the modification to your global data structure. Think of it sort of like a really tiny database transaction. What makes it go fast, is that the bidirectional communication with the kernel happens via shared memory.”

bryanlarsen22d ago

That doesn't really explain it though, IMO. IIUC, it's a sequence of instructions that either runs to completion atomically or doesn't. If it is interrupted by anything the kernel jumps you to the abort/retry vector you set with a guarantee that the last instruction in the sequence was not executed.

(Based on my reading of the LWN article rwmj posted).

khuey22d ago

Yes, the API contract isn't "don't interrupt me during this critical section" it's "if you have to interrupt me during this critical section, go to this recovery/restart code".

There is a time-slice extension feature in the works that's roughly "please let me finish this critical section before you interrupt me". But a hard guarantee that userspace code won't be interrupted is probably untenable in a preemptive multitasking system.

2 more replies

crote22d ago

> it's a sequence of instructions that either runs to completion atomically or doesn't

The way I read it, it either runs to completion in one go, or gets restarted from the beginning. This means the sequence as a whole isn't executed atomically, as the already-executed instructions during an interrupt aren't rolled back.

It can be used to build atomic actions, but it is up to the developer to create a sequence of instructions where the very last instruction "commits" the entire operation, with the side-effects of partial execution being harmless.

1 more reply

rwmj22d ago

LWN has a good article: https://lwn.net/Articles/1033955/

jstimpfle22d ago

I think it wasn't explained in a very accessible way. If I got the gist right, this essentially brings "per-CPU" synchronization to userland. It's typical in the kernel to have per-cpu data, while per-thread data is rare and typically impractical. There is a high number of threads managed by the kernel, most of which probably belong to a userland process, most of which do not participate in any given synchronisation scheme. Also threads are often too much of an abstraction for parallel programming needs, given that they are hiding for example cache effects. So it's natural to want to use per-cpu data instead of thread_local data in a userland process, I know I've been wishing for that many times.

With rseq, we can allocate in any userland process one instance of a given synchronisation data structure per each CPU. It's important to understand that userland code accessing per-cpu data structures cannot prevent being scheduled away from a CPU and being replaced by another thread (kernel code can block scheduler for short critical sections). Such a replacement thread may subsequently corrupt that same data that was still in the middle of the transaction. But we can make a subset of transactions safe at least: If a transaction gets committed in a single (final) atomic instruction, and we get kernel support for this transaction to be restarted in case there has been a schedule mid-way, this is a guarantee that at the time of commit, the entire transaction hasn't been interrupted by the scheduler. I.e. a kind of "mutual exclusion" guarantee.

Did I get that right?

saagarjha14d ago

You don't have to use it for this. For example, you can use it for your own transactional memory or hazard pointer scheme.

manoDev22d ago

That’s clever — am I right to think it’s the intermediate solution between locks and full STM, implemented at the kernel level, and with zero abstraction cost?

khuey22d ago

It's in some sense a light form of STM. The key insight behind rseq(2) is that if the data is local to a given CPU the only way to get a race is if the kernel deschedules your program from that CPU at an inopportune time. If your operation can be aborted and restarted and the kernel has a mechanism to notify you when that needs to happen you can dispense with the overhead of "real" synchronization and just use a couple mov instructions to enter and exit the critical section.

squirrellous21d ago

I am not very well versed here but I think due to the requirement for assembly and single-instruction commit, practical uses of rseq is generally very simple. It is nowhere near the usefulness of locks.

rurban21d ago

Certainly not zero cost. Syscalls are heavy, that's why everyone prefers green threads.

And 2 more ops per rseq.

But rseq's are certainly cool.

kajaktum21d ago

I don't get it, how does this work with multiple processes running at 100% CPU time? Atomics are necessary for the CPU level.

znpy21d ago

I wonder what the speedup would be if runtimes like golang or iava were to adopt rseq on linux.

khuey22d ago· 12 in thread

Maybe I'm just getting old but the "if you don't spend $20,000 on a workstation you're going to be left behind like a dinosaur" at the top of this article is a huge turn off to reading any further. And I say that as someone who owns a workstation with more cores than the author's.

Avicebron22d ago

The author was also asking for money to buy a house in SF and travel on private planes like a few days ago..the donation must have really showed up if they are using 20k machines at home.

wasabi99101122d ago

She bought the workstation at a discount (see bottom of TFA).

Also, it was pre-2025 (before she got her job at Gradient Canopy), so long before asking for donations.

Finally, if you read the actual donation request, you can see she is trying to make a living doing open source, and is being honest about what the money is going to. Why is that an issue?

1 more reply

blauditore22d ago

Do you have a link?

2 more replies

nutjob222d ago

> I have nothing against this person.

Your bad faith reading of that article says otherwise. It's bleedingly obvious that it's satire. Do you think people seriously ask for donations to fund a private airplane?

$20K of gear isn't that much if you're an independent developer, and if you're working for others as such in the US, and you're not a financial basket case, it's doable. She even says "It put me in the poor house for a few months" so she made sacrifices to get there. You can too, if you want to. Why the envy?

To the contrary, so you really seem to have a problem with her.

[references https://web.archive.org/web/20260529122658/https://justine.l... ]

1 more reply

loeg22d ago

I wouldn't read it too literally. I read it as jokingly rationalizing an obviously overkill self-indulgent purchase.

camgunz21d ago

I gotta think people are working pretty hard to not see it's a joke.

toast022d ago

If we're being overly generous, they're saying you need at least a raspberry pi? You can see a 3x improvement there, which shows the pattern works, and that's good enough for a dinosaur (this interpretation is easier to justify if you just skim the article... Which I did the first time)

But agreeing with you, I've done big optimization stuff for multicore servers (not as many cores, but same kind of work) and my workstation was something small with not even the same os. I don't need the big machine on my desk to understand the concepts. I just need the big machine to check my work. For me, that's always been a production machine, sometimes a production machine taken out of rotation for pre-validation before running on production load. I guess I should mention, I work on applications specifically, and libraries and kernels as it relates to making whatever my application is work better. I also don't have a problem with pinning threads to cpus... but my applications are usually one big program that fills the system. Someone writing a general purpose library has a harder time.

Of course, if you want to do this kind of work and you don't have your own production load, you're going to have to borrow, rent, or buy a big machine. It doesn't need to be your workstation though. I hate working with cloud nonsense, but if your tests are short, and you do the upfront work to make your images start fast, you can probably save a lot of money by renting spot instances when testing ... I don't know if you can do spot instances of bare metal though, so you're probably stuck with vm overhead.

khuey22d ago

Yeah, you can rent an equivalent workstation from AWS for under $10/hour (and that's the on demand price) so I don't think cost is a huge barrier to doing this sort of work. The language and listing the prices of the workstations down to the penny just strikes me as a rather unprofessional way to communicate.

1 more reply

kitd22d ago

Go to the end. Those machines were provided by the companies discounted so she could continue her llm research. She's not saying "anyone who's anyone" has a 1024-core workstation these days.

keybored21d ago

Yep. The consolation is I guess that it might be better if developers with such modest means as “splurged on” two 20KUSD~ work stations (and left in the “poor house” for a few months, oh my) are competitive instead of absolutely all of us becoming compute renters.

But then they go on to take a Gemini job[1] so I dunno, more consolidation than consolation perhaps.

[1] TFA says “job offer” but another comment[2] says that they work there.

[2] https://news.ycombinator.com/item?id=48348919

nutjob222d ago

She's clearly making a point about taking advantage of the optimization/algorithm she's pitching, and doesn't seem very serious. Alternatively, someone reading that as a serious claim is rather naive.

lpapez22d ago

Except that is not what the article says and you clearly missed the sarcasm.

senderista22d ago· 4 in thread

I'm surprised there was no reference to the librseq library, maintained by the rseq implementer:

https://github.com/compudj/librseq

This has helpers for common use cases like counters and linked lists. You shouldn't need to write assembly at all to use rseq in most applications.

wmf22d ago

Justine is writing her own libc and her own malloc so I'm not surprised she wants to use rseq from scratch.

senderista22d ago

That's fine, but I think an article claiming to give an introduction to a technology should at least mention that an essential library exists, and that writing assembly is no longer usually required.

1 more reply

dividuum22d ago

I'm took a brief look and left confused. The list implementation seems completely bog standard with no special code for synchronization whatsoever. I don't see any counter and the rseq syscall seems unused except for feature detection. I don't think that's a viable replacement for any low level code.

bonzini21d ago

The low level interface is documented at https://github.com/compudj/librseq/blob/master/include/rseq/..., the list is just an internal implementation detail.

The syscall these days is invoked by libc not the program; libc provides access to some symbols that let the program execute rseqs as well.

Veserv22d ago· 2 in thread

Restartable windows, or more generically introspection windows, are a really useful technique you can apply in any situation where you understand or control the sources of preemption. The earliest uses of this technique in operating systems that I am aware of are ~25 years old.

The key insight is that the preempter can introspect the program counter of the code being preempted (which is now stable since it was preempted) and act accordingly. The simplest mechanism is to reset their program counter if in a critical section. The more generic mechanism is to jump them to a supplied address. This allows you to do things like hard abort and more.

You can further remove the need for the preempter to understand the preempted code by having the preempted code create a self-introspection code snippet and supplying that with the program counter at preemption. So the preempter just vectors them to their own code which knows how to interpret its own state at any preemption point.

senderista22d ago

There is a paper from Sun that anticipated tcmalloc's development of rseq by over a decade:

https://dl.acm.org/doi/abs/10.1145/512429.512451

Veserv22d ago

Yep, it is a fairly old technique with a lot of of general applicability beyond just allowing mutex elision for usage of per-core data structures amidst potential core migration. But apparently using your own expert knowledge and actually explaining things and describing generalizations is worthy of flagging these days.

1 more reply

yubblegum22d ago· 2 in thread

> chances are the CPU's internal mutexes aren't as good as the ones you've implemented in userspace

Anyone with an informed opinion on this statement? It's seems counter intuitive (npi).

khuey22d ago

The author is referring to false sharing (https://en.wikipedia.org/wiki/False_sharing). CPU caches operate at cache line granularity (typically 64 bytes) so writes to one part of the cache line can require synchronization with writes to non-overlapping parts of the same cache line. This can dramatically reduce performance when there are a large number of cores operating on the same cache line.

If you remove the 64 byte alignment (which forces each counter variable onto a separate cache line) from hitcounter-shard.c you ought to be able to see the performance difference for yourself.

yubblegum22d ago

Thanks for the effort but the q wasn't "what is false sharing",

The Q is: is it true the CPU mutexes are actually slower than those implemented in userspace?

2 more replies

keyle22d ago· 2 in thread

   ... bidirectional communication with the kernel happens via shared memory.

What could possibly go wrong?

loeg22d ago

Elaborate? Kernel shared memory interfaces are often reasonable (vdso, io_uring).

HackerThemAll21d ago

It's 32 bytes. Educate yourself before commenting.

squirrellous22d ago

IIUC rseq is similar to thread-local data with the additional benefit that it scales with number of CPU cores, not threads. However if you are an application developer and is able to control all the threads in an application, then rseq isn’t that superior.

I fully agree that rseq should be more easily available to Linux developers, though.

smasher16422d ago

I was having a conversation with someone recently if RSEQ would be a good primitive to build a load-link/store-conditional implementation in user-space. It gives you a critical window, though you still have to deal with spurious restarts, and provide a way for one core to abort another.

HackerThemAll21d ago

The name is so misleading... The first thing I see when hearing "sequence" is the "arithmetic sequence", like 1,2,3,4. Therefore "restartable sequence" is like 1,2,3,4, 1,2,3,4, 1,2,3,4... Closer to SQL's "CREATE SEQUENCE" than "restartable sequence of assembly instructions". I could not comprehend how this can help with lock free data exchange. I've done my homework now.

1vuio0pswjnm721d ago

Was greenbean ever tested with NetBSD ftp

Something like

   greenbean

then

   ftp -vvd4o/dev/stdout http://127.0.0.1:8080

This is labeled as an error: "fragmented message"

Also why doesn't redbean do TLS1.3

And rusage "wall time" output seems to be wrong

Too much emphasis on Unicode for me, it's off-putting

I like consoles that do _not_ support UTF-8. At least, it should be optional

matheusmoreira22d ago

This is amazing. I'll definitely use this in my projects.

NuclearPM21d ago

People who say 10x programmers don’t exist have never heard of Justine.

j / k navigate · click thread line to collapse

73 comments

46 comments · 12 top-level

GlenTheMachine22d ago· 12 in thread

If you had no idea what a restorable sequence is the takeaway is about halfway down the OP:

bryanlarsen22d ago

(Based on my reading of the LWN article rwmj posted).

khuey22d ago

Yes, the API contract isn't "don't interrupt me during this critical section" it's "if you have to interrupt me during this critical section, go to this recovery/restart code".

2 more replies

crote22d ago

> it's a sequence of instructions that either runs to completion atomically or doesn't

1 more reply

rwmj22d ago

LWN has a good article: https://lwn.net/Articles/1033955/

jstimpfle22d ago

Did I get that right?

saagarjha14d ago

You don't have to use it for this. For example, you can use it for your own transactional memory or hazard pointer scheme.

manoDev22d ago

That’s clever — am I right to think it’s the intermediate solution between locks and full STM, implemented at the kernel level, and with zero abstraction cost?

khuey22d ago

squirrellous21d ago

rurban21d ago

Certainly not zero cost. Syscalls are heavy, that's why everyone prefers green threads.

And 2 more ops per rseq.

But rseq's are certainly cool.

kajaktum21d ago

I don't get it, how does this work with multiple processes running at 100% CPU time? Atomics are necessary for the CPU level.

znpy21d ago

I wonder what the speedup would be if runtimes like golang or iava were to adopt rseq on linux.

khuey22d ago· 12 in thread

Avicebron22d ago

The author was also asking for money to buy a house in SF and travel on private planes like a few days ago..the donation must have really showed up if they are using 20k machines at home.

wasabi99101122d ago

She bought the workstation at a discount (see bottom of TFA).

Also, it was pre-2025 (before she got her job at Gradient Canopy), so long before asking for donations.

Finally, if you read the actual donation request, you can see she is trying to make a living doing open source, and is being honest about what the money is going to. Why is that an issue?

1 more reply

blauditore22d ago

Do you have a link?

2 more replies

nutjob222d ago

> I have nothing against this person.

Your bad faith reading of that article says otherwise. It's bleedingly obvious that it's satire. Do you think people seriously ask for donations to fund a private airplane?

To the contrary, so you really seem to have a problem with her.

[references https://web.archive.org/web/20260529122658/https://justine.l... ]

1 more reply

loeg22d ago

I wouldn't read it too literally. I read it as jokingly rationalizing an obviously overkill self-indulgent purchase.

camgunz21d ago

I gotta think people are working pretty hard to not see it's a joke.

toast022d ago

khuey22d ago

1 more reply

kitd22d ago

Go to the end. Those machines were provided by the companies discounted so she could continue her llm research. She's not saying "anyone who's anyone" has a 1024-core workstation these days.

keybored21d ago

But then they go on to take a Gemini job[1] so I dunno, more consolidation than consolation perhaps.

[1] TFA says “job offer” but another comment[2] says that they work there.

[2] https://news.ycombinator.com/item?id=48348919

nutjob222d ago

lpapez22d ago

Except that is not what the article says and you clearly missed the sarcasm.

senderista22d ago· 4 in thread

I'm surprised there was no reference to the librseq library, maintained by the rseq implementer:

https://github.com/compudj/librseq

This has helpers for common use cases like counters and linked lists. You shouldn't need to write assembly at all to use rseq in most applications.

wmf22d ago

Justine is writing her own libc and her own malloc so I'm not surprised she wants to use rseq from scratch.

senderista22d ago

That's fine, but I think an article claiming to give an introduction to a technology should at least mention that an essential library exists, and that writing assembly is no longer usually required.

1 more reply

dividuum22d ago

bonzini21d ago

The low level interface is documented at https://github.com/compudj/librseq/blob/master/include/rseq/..., the list is just an internal implementation detail.

The syscall these days is invoked by libc not the program; libc provides access to some symbols that let the program execute rseqs as well.

Veserv22d ago· 2 in thread

senderista22d ago

There is a paper from Sun that anticipated tcmalloc's development of rseq by over a decade:

https://dl.acm.org/doi/abs/10.1145/512429.512451

Veserv22d ago

1 more reply

yubblegum22d ago· 2 in thread

> chances are the CPU's internal mutexes aren't as good as the ones you've implemented in userspace

Anyone with an informed opinion on this statement? It's seems counter intuitive (npi).

khuey22d ago

If you remove the 64 byte alignment (which forces each counter variable onto a separate cache line) from hitcounter-shard.c you ought to be able to see the performance difference for yourself.

yubblegum22d ago

Thanks for the effort but the q wasn't "what is false sharing",

The Q is: is it true the CPU mutexes are actually slower than those implemented in userspace?

2 more replies

keyle22d ago· 2 in thread

   ... bidirectional communication with the kernel happens via shared memory.

What could possibly go wrong?

loeg22d ago

Elaborate? Kernel shared memory interfaces are often reasonable (vdso, io_uring).

HackerThemAll21d ago

It's 32 bytes. Educate yourself before commenting.

squirrellous22d ago

I fully agree that rseq should be more easily available to Linux developers, though.

smasher16422d ago

HackerThemAll21d ago

1vuio0pswjnm721d ago

Was greenbean ever tested with NetBSD ftp

Something like

   greenbean

then

   ftp -vvd4o/dev/stdout http://127.0.0.1:8080

This is labeled as an error: "fragmented message"

Also why doesn't redbean do TLS1.3

And rusage "wall time" output seems to be wrong

Too much emphasis on Unicode for me, it's off-putting

I like consoles that do _not_ support UTF-8. At least, it should be optional

matheusmoreira22d ago

This is amazing. I'll definitely use this in my projects.

NuclearPM21d ago

People who say 10x programmers don’t exist have never heard of Justine.

j / k navigate · click thread line to collapse