undefined | Better HN

0 pointswahern5y ago0 comments

> Cross-thread communication is expensive. Single-threaded async task interaction is very cheap, comparatively

This all depends on how threads are implemented. If they're scheduled preemptively then communication can be expensive, relatively speaking, because of the need for locking and atomic operations. But you can also schedule cooperatively in user space, just as Tokio does when serially resuming async tasks; or as Java's Project Loom does for its new "lightweight" threads.

Note that unlike JavaScript, Tokio and Project Loom can also run different tasks on different, preemptively scheduled threads. And while I don't know that much Rust, I imagine you're going to need to use either unsafe or Rc or maybe even Arc if you intend to share data between different Tokio tasks--i.e. data that doesn't fit the normal caller/callee borrow semantics.

The other part of the problem is space requirements. Usually where you have preemptively scheduled threads the stack space for a thread is allocated lazily as a function is called and faults in pages via the OS' virtual memory system, much like single-thread, single-stack processes in a preemptive process OS. This means the minimum space allocation for a thread is at least 2x the page size (e.g. 4096 * 2). But many times a thread of execution only goes a couple of function calls deep, with minimal amounts of function-local (i.e. stack-allocated) data. If you have 1 thread per network connection, with hundreds of thousands or millions of connections that overhead could be significant.

But this, too, is a function of the implementation. Goroutines in Go use normal heap memory for stacks, and the compiler emits code to grow and move threads automatically. Rust proponents will tell you that async functions don't require any runtime cost because the stack requirements can be calculated statically. But to calculate this statically you can't support recursive functions. And if you can statically calculate your space requirements for the hidden async state object, you could also statically calculate the stack size for a thread just the same.

So really what it all comes down to isn't whether "async" is better or worse than "threads" along any of these dimensions. Abstractly, all threading implementations are async, and all async implementations effectively implement threads (i.e. a data structure that encapsulates a program counter, local automatic storage, etc). The real reason you choose one over the other is external factors. For Rust that dominate factor is interoperability with native C ABIs, particularly native stack disciplines. Because Rust can't implement much magic in the lower layers of the runtime environment while maintaining the degree of interoperability with C, C++, and other language libraries (via the C ABI) that they're committed to, they have no choice but to put most of the instrumentation into the language itself. And this necessitates the async contortions, independent of any other preferences. Contrast that with Go, where calling into C is slightly more costly because they preferred to push more of the async/thread abstraction beneath the language syntax.

But perhaps what this tells us is that we should think about revisiting native stack disciplines and thread scheduling semantics. IIRC, Linux will soon get scheduler activations (i.e. ability for userland to efficiently switch execution to another specified kernel-visible thread). That's a small step in the right direction, and if it catches on more operating systems will adopt this--after having ditched them 20 years ago, ironically, before async network I/O became popular and when 1:1 thread scheduling became the preferred kernel model).

0 comments

2 comments · 2 top-level

wbl5y ago

Bryan Cantrill's undergrad thesis would be cool to replicate today. 1:1 won because of performance pathologies in M:N.

tijsvd5y ago

I agree, it would be great if we could just write threads and not worry about performance. In the end, at least for me, async is a poor compromise between ergonomics and performance.

Unfortunately we're not there yet. Golang with GOMAXPROCS set to 1 comes close, but now I lose the ability to spawn real threads for expensive computation.

j / k navigate · click thread line to collapse