Rust without the async (hard) part (opens in new tab)

(lunatic.solutions)

149 points_5blv4y ago133 comments

133 comments

73 comments · 16 top-level

zaphar4y ago· 14 in thread

Anything using the green/lightweight or OS thread model is usually easier to use at the cost of some runtime performance. Whether the runtime performance matters for your use case can only be determined by measuring stuff.

The perception that async rust is where you should start for concurrent rust because it's built in and everyone uses it perhaps should be revisited. I would argue that the other options are worth consideration first and dropping down to low level async code might be warranted when you need the performance it gives and that justifies the increase in development costs.

pornel4y ago

Rust used to have green threads before 1.0 (libgreen). Early Rust was meant to be more like Erlang[1]. The problem with them wasn't only the overhead, but also interoperability and how they affect every interaction of the language with the OS and other libraries. It made the whole language dependent on its own custom runtime.

Rust isn't meant to be a language for CRUD apps (despite making inroads in this space). It's meant to be a C/C++ alternative that can work every difficult niche where these two can, including processes that already have their own runtimes, kernel space, microcontrollers, and other situations where any overhead or bringing custom threads with magic I/O and special stack handling is unacceptable.

Rust's async is designed to be separate from the core language, and work on top of arbitrary runtimes. Most people use tokio, but it can also work with your custom loop on microcontrollers, or on top of another runtime, e.g. WASM + browser's event loop, or gtk-rs that can work on top of GTK's event loop.

[1]: http://venge.net/graydon/talks/intro-talk-2.pdf

zaphar4y ago

I'm aware of the history there. I think the decision not to ship a builtin async runtime was probably correct. I also think shipping async syntax sugar and allowing people to build their own custom runtimes is just fine.

I just think that the cultural decision in the wider ecosystem to make, practically speaking, everything io related, async is possibly a mistake.

1 more reply

loeg4y ago

Too many major packages in the ecosystem only support an async model now. It's pretty frustrating if you are just writing a synchronous program, or one with a straightforward OS threading model.

kirbyfan64sos4y ago

If your program is mostly synchronous, you can manually create the async runtime and just use block_on to call async functions from a sync context: https://tokio.rs/tokio/topics/bridging#a-synchronous-interfa...

2 more replies

dboreham4y ago

This "async virality" syndrome is the main reason why async is harmful imho. _Some_ async can be very useful in certain constrained circumstances, I believe. However forcing the async execution model on all code is a terrible idea.

3 more replies

U1F9844y ago

There's a neat crate for that I recently found: https://crates.io/crates/pollster

1 more reply

ianbutler4y ago

I started writing rust ~6mo ago and while I agree with your sentiment, the issue I've run into is that so many packages I need to use, because there isn't an alternative and I don't want to build it myself, already uses async. I then have to either heavily wall off that part of my code or at a certain threshold realize I may have to adopt async myself because keeping two concurrency models going is really a lot of overhead.

It's hard to wind down that existing momentum.

api4y ago

Async has really taken over anything networking-related because, well, it offers much better scaling and performance. If you're a package author you're going to get more people asking for async than people that don't want it. There is no sane way to make async optional in a library and reuse code.

3 more replies

hgomersall4y ago

Personally, once I grokked async rust, I found it much easier to use and reason about than threads. Things just seem to map better without any messy stuff to think about.

eloff4y ago

Yes, async is hard. It adds lots of complexity, both to the code and in your mental model. That slows development. I'd rather have faster development most times. It's why I prefer to use Go over Rust whenever possible. That's why I'm really interested in what lunatic is doing here. It might narrow the gap a little.

ithrow4y ago

Yes, async is hard. It adds lots of complexity, both to the code and in your mental model. That slows development.

Nodejs devs seem to be doing fine? and I would say their development is faster than most devs working on other stacks. Nodejs is also a top 3 server stack and growing.

1 more reply

daenz4y ago

Imo, 99% of the time, ergonomics should take precedence over power. Power can always be added later with clever hacks, without ruining an ergonomic interface. But adding ergonomics to power is a much more broken process.

estebank4y ago

> Power can always be added later with clever hacks, without ruining an ergonomic interface.

This puts limits on what can be accomplished. Starting with a more restricted set of code allowed, and then expanding it over time can be more successful in many cases, without locking you into a perhaps more ergonomic looking interface that needs to be coddled with no tooling support to avoid the "slow path". For examples in Rust: `impl Trait` used not to exist, which meant you had to use `Box<dyn Trait>` instead, which can be slower and certainly ads some verbosity. Then `impl Trait` was added and a bunch of code was now representable, and soon `type Alias = impl Trait;` will be stabilized which will allow even more code to be representable, in a way that is both performant and easier to use. A language that instead says "just use `-> Trait` and the compiler will figure out what to do" would have increased the user's perf without intervention, but for anyone that really cares about FFI stability or wants to keep on top of heap allocations would be out in the cold.

It is the same reason that you can complain about the complexity of the String/&str distinction in Rust[1], but avoiding lingering references to big strings in JS (effectively a memory leak) becomes much harder.

[1]: https://fasterthanli.me/articles/working-with-strings-in-rus...

necubi4y ago

That's a reasonable choice of priorities to have, but it's the opposite of Rust's. Rust prioritizes (1) safety, (2) performance, (3) ergonomics, in that order. There are other languages that make put ergonomics before performance but they are generally unsuitable for Rust's niche.

Matthias2474y ago· 10 in thread

> The problem is that threads just don’t work in practice for massive concurrency.

That's an assumption that is repeated very often recently, and measured very rarely. Truth is that they amount of applications for which they don't work is surprisingly low. I'm working at a well known cloud provider, and lots of people would really be suprised which applications at largest scale are working fine with a thread-per-request model. 50k OS threads are not really an issue on modern server hardware. While it might not be the most efficient [1], it will not perform so bad that it causes an availaiblity impact either.

There's obviously some exceptions to that [2] - but I encourage people to measure instead of making assumptions. Unless one finds themselves in a weekly meeting about server efficiency or scaling cliffs both models probably work.

[1] it really depends on the workload, but people might find an efficiency degradation (e.g. measured as BYTES_TRANSFERRED/CPU_CORES_USED) of 20% at a concurrency level of 1000, or maybe only at a concurrency level of 10k. Coarse-grained work items (e.g. send a large file to a socket) will show a lower degradation.

[2] Load balancers, CDN services, and e.g. chat applications which maintain a massive amount of mostly idle client connections can be such environments. They have a high amount of concurrency that needs to be managed, but less so of "active concurrency". If all clients would be active at the same time, those environments would run out of disk IO or network bandwidth far before CPU or memory become an issue.

gopalv4y ago

> While it might not be the most efficient, it will not perform so bad that it causes an availaiblity impact either.

Performance is important, but the biggest performance gain happens when a program goes from not working to working correctly.

Debugging is another corner case which async makes it intolerably hard to get backtrace and make sense out of what is going on.

It's not like debugging threads is easy, but in a low contention environment which is entirely "1 thread holds state of one request" and there are few interlocking threads in it, threading is a fair bit better than async execution. Plus the logs which indicate thread-names make it possible to draw out something like a post-processed Catapult timing diagram (open chrome://tracing and look at an example, it is a great UI for dropping in your own multi-threaded event log as JSON).

I'm a big fan of executor thread-groups and work queues, but damn does it make hard to mentally walk through a bug when the stack traces are scattered across multiple places.

bsder4y ago

> > The problem is that threads just don’t work in practice for massive concurrency.

> That's an assumption that is repeated very often recently, and measured very rarely.

I would go further--there is a whole infrastructure that needs to appear when massive concurrency is involved and very few times is that taken into account.

For those people interested in genuine massive concurrency, I encourage people to investigate Erlang. In my opinion, the language itself is just "meh", but OTP, the infrastructure around managing, upgrading, restarting, etc. processes/threads, is extremely on point.

Side note: Erlang still has the absolute best handling of binary parsing of any language ever. https://www.erlang.org/doc/programming_examples/bit_syntax.h...

I really wish the Rust people would pick something like the Erlang Bit Sytax up and integrate it with their pattern matching (probably necessitating some pattern matching language fixes) rather than the amount of effort they continue to piddle on async/await.

rad_gruchalski4y ago

Erlang pattern matching is awesome. Matching on binaries makes it very easy to parse protocols.

Re concurrency. I learned Erlang before Akka. It took me a bit but I find Akka more ergonomic. Akka will easily handle millions of actors on a single machine, too. But I always miss matching on binaries.

Another good one is protoactor for golang. That will also do a million actors no problem. Comes really close to Erlang in terms of how concise the syntax is. But again, no binary matching.

woah4y ago

Why would you assume that all software is written for servers in datacenters? Rust tends to be used in embedded devices, WASM, and other weird contexts where there might not be as many resources available.

If you're writing a CRUD app, sure, do it in PHP and spin up a thread per request.

rat99884y ago

Because he is talking about massive concurrency, not embedded or wasm or other contexts where there not be as many resources available.

1 more reply

int_19h4y ago

Embedded is much less likely to need async in the first place at all.

3 more replies

eklitzke4y ago

I agree and this article seems pretty misinformed. Creating and managing threads on Linux is extremely cheap, especially when a lot of them are idle, and a lot of big companies (Google, Facebook, Amazon) have tons of huge C++ applications that have thousands of threads and it's fine. I also think a lot of people who don't work on these problems at these kinds of companies assume that it must be incredibly difficult to write code like this and debug it, but that's not really true. For one thing, generally the tricky parts to write are abstracted away so that regular engineers don't have to think much about threading concurrency issues. And when they come up, tsan and lock annotations[1] will catch 99.9% of these problems in testing and make it easy to understand why things are breaking.

In the real world here are the kinds of problems that people at Google etc. care about when it comes to performance or scalability issues with hugely concurrent programs:

  - Noisy neighbor problems from other threads messing with your TLB and L1 cache
  - High cost of context switches
  - Unpredictable scheduling/priority inversion in the scheduler

The first problem isn't actually made any better by using async coroutines or green threads/fibers, if you switch to another coroutine or fiber and it does something naughty (e.g. munmaps memory, which will cause a TLB shootdown) it's going to degrade performance for your unrelated coroutine/fiber.

The second and third problems can be solved in some cases by things like fibers and userspace scheduling, but this is a fairly advanced topic and "just use async" is definitely not the solution. If you're interested in learning more about how these problems are actually solved at Google for example I recommend [2] and [3].

[1] https://abseil.io/docs/cpp/guides/synchronization#thread-ann... [2] https://www.youtube.com/watch?v=KXuZi9aeGTw [3] https://storage.googleapis.com/pub-tools-public-publication-...

ibraheemdev4y ago

> - Noisy neighbor problems from other threads messing with your TLB and L1 cache

Switching between threads within the same process doesn't require a TLB or L1 cache flush. Not sure if you were implying this, just wanted to point that out.

> - High cost of context switches

Userspace schedulers (like rust's tokio) do make context switching cheaper, however, most of the context switching in the case of a web server is due to blocking I/O and the most expensive part of the switch, entering the kernel, is already accounted for by the I/O request. Kernel context switching is unlikely to be your bottleneck.

> Unpredictable scheduling/priority inversion in the scheduler

This can definitely be an issue at scale, but a general purpose async scheduler like most use is unlikely to be any better.

rstuart41334y ago

As another data point, I have one Firefox window right now:

    $ ps -eLf | grep firefox | wc -l
    569
    $

geodel4y ago

Once one go with cultish following of async everything idea, measuring things would be heresy.

verdagon4y ago· 9 in thread

Does anyone else get the feeling that we (as a field) are missing something basic about concurrency? Like there's a really elegant solution just around the corner, that has the low overhead of async/await without the complexity. Or otherwise put, the ease of goroutines but without GC.

I know it sounds crazy. I recently dove into the area, and was pretty surprised at how many interesting building blocks there are out there. It feels like if we just combine them in the right way, we'll discover something that works a lot better.

Off the top of my head:

Google discovered a way to switch between OS threads without the syscall overhead. All it needs is to solve the memory overhead. [0]

Zig discovered a way to use monomorphization to enable colorless async/await. If someone could figure out how to make it work through polymorphism / virtual dispatch, that would be amazing. [1]

Vale discovered a possible way to make structured concurrency in a memory safe way that's easier than existing methods. [2]

Go [3] and Loom [4] show us that we can move stacks around. Loom is particularly interesting as it shows we can move the stack to its original location, a unique mechanism that could solve some other approaches' problems with pointer invalidation.

Cone is designing a unique blend of actors and async await, to enable simpler architectures. [5]

We're close to solving the problem, I can feel it.

[0] No public docs on it, but TL;DR: we tell the OS the thread is blocked, and manually switch over to it by saving/manipulating registers.

[1] https://kristoff.it/blog/zig-colorblind-async-await/

[2] https://verdagon.dev/blog/seamless-fearless-structured-concu...

[3] https://blog.cloudflare.com/how-stacks-are-handled-in-go/

[4] https://youtu.be/NV46KFV1m-4

[5] Can't find the link, but was a discussion on their server.

duped4y ago

I just want to say there are mountains of research on this, and recent development is exciting, but some of the techniques (like stack switching and moving) are very old. Project Loom is very intriguing because of how it solves the practical problems of introducing old concurrency techniques into existing language implementations that were not designed around them.

A lot of this stuff is intriguing from the implementation side, but where we're really lacking is in the syntax and semantic side to make concurrency "make sense" to programmers. I don't think we're close to solving that problem (for example, call/cc isn't the answer, it's the problem).

imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

A lot of the symptoms of the sync/async problem are caused by the explicit decoupling of sync/async APIs in source code. If you remove that and force it to be implicit internal to the language implementation, the issue goes away. It would take a lot of work to determine if that was worth it.

Basically as we've now accepted garbage collection to be an acceptable part of language implementation, one day I think we'll accept async executors to be a part of that too. We're halfway there on the impl side (Go, Java through Loom, NodeJS, etc). The other half is removing the explicit syntax for it.

hayley-patton4y ago

> imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

Safepoints for garbage collection are somewhat similar, but for preemption one wants to interrupt threads on a timer, rather than before the collector takes over. Despite occurring very frequently (at around 100 _million_ checks per second), the time overhead is only about 2.5% or so, according to a study by Blackburn et al [0]. It appears, I think, that as long as the fast not-interrupting path is fast enough, eliminating safepoints isn't too important.

[0] Stop and Go: Understanding Yieldpoint Behaviour <https://users.cecs.anu.edu.au/~steveb/pubs/papers/yieldpoint...>

gjvnq4y ago

Sounds like Erlang and single assignment languages.

Jokes aside, part of the problem seems to be the computer model and cpu architectures themselves.

We need something that is designed from scratch to run things concurrently.

1 more reply

useerup4y ago

> Does anyone else get the feeling that we (as a field) are missing something basic about concurrency? Like there's a really elegant solution just around the corner, that has the low overhead of async/await without the complexity. Or otherwise put, the ease of goroutines but without GC.

Yes. There is current research into Algebraic Effects (see for instance https://www.microsoft.com/en-us/research/wp-content/uploads/...).

Algebraic Effects promise a return to non-colored functions, as AE can abstract over exceptions, continuations, async and other control-flow mechanisms.

kmeisthax4y ago

A decade ago the simple thing we were missing about threaded concurrency was Rust's ownership and borrowing model and Send/Sync. Before that, the simple thing was to use early Java, which had a mandatory garbage collector and monitor objects. If you didn't have or use those, then you were subject to memory safety problems. And moving from heap-scanning GC to ownership and borrowing gave a genuine performance advantage.

Now, we want to remove threading from the concurrency story, in the hopes of getting another performance boost. This itself is the problem, because threads were giving us automatic preemption, akin to how GCs were giving us automatic memory safety. Now we have to statically determine a "good time" for the program to yield. I/O yielding is the easy part, and the reason why people are flocking to async; but we also need to support yielding for fairness reasons. Kernels can do this because they have interrupt timers; but there's no lower-overhead equivalent for userspace code that I'm aware of.

The other problems mentioned with async Rust are particular to Rust itself. The language has a policy that heap allocations only ever happen in `std`, because they want to support embedding Rust into applications where heaps don't exist. This means that futures need to be structs. Rust does support structs of indeterminate size, but barely; and there's no support for structs that can grow. Such a thing is likely unsound without a way for the compiler to check growth limits, and the memory is pinned, so we can't grow beyond a preset limit set at the start of the future[0].

Async infects everything it touches because it's a total pain to write networking library code that's preemption-agnostic. Monad<T> would fix that, but higher-kinded traits aren't a thing in Rust yet and we would need lots of language tooling (akin to `?`) to make this ergonomic to use.

There's also just the possibility that we've been engineering the wrong fix, and we should be trying to get OS threads to be as lightweight as possible rather than trying to move the entire threading system into userspace. There's no particular reason why we need 8MB stacks, other than the fact that compilers don't check stack growth themselves. (Which, BTW, is also a soundness hole in Rust as far as I know.)

[0] Go gets around this with a linked list of stacks, which adds its own overhead.

ghoward4y ago

I'm betting on structured concurrency. I think it will be the same sort of revolution for concurrent programming that structured programming was for single-threaded programming.

Nullabillity4y ago

You won't solve the broken and unusable programming model of threads by trying to emulate the programming model of threads.

xrobledo844y ago

What you are looking for is called Erlang

jjnoakes4y ago

How is that "the ease of goroutines but without GC" ?

lewantmontreal4y ago· 5 in thread

I use Rust for the amazing types, map/filter/reduce, and, even if I never write macros myself, beautiful libraries like serde and clap. I do need to often use async to wait for multiple network requests at once, although I'm not quite comfortable with it.

Requesting urls n-at-a-time took me a while (https://play.rust-lang.org/?version=stable&mode=debug&editio...). In particular rust-analyzer itself cannot figure out `buffer`'s type here.

You can consider me very intrigued by Lunatic.

etra04y ago

Sometime ago I was comparing go, python and Rust to do some GET request asynchronous.

At first, I noticed that the go version was actually faster than the Rust one, and then I saw that in `reqwest`, they recommend you if you're doing multiple GET request, to create a `Client` and then use that to get better performance[1]. After changing my code, the Rust version was effectively a bit faster (not by much, to be honest, which was a bit disappointing considering go's version was way easier to write, and I say this as a generally rust shill).

Hopefully this comment is somewhat helpful :)

[1] https://docs.rs/reqwest/latest/reqwest/#making-a-get-request

Matthias2474y ago

reqwest::get() is even worse than not having a connection pool. It will also reload the full content of system certificate stores on each invocation - since it creates a new reqwest client. On some hosts that can take 10-100ms alone.

Always create a client explicitly. And also always add a timeout.

The Go http.Get() function uses a shared global client, so making a request doesn't have high initialization costs, and requests can make sure of a shared connection pool.

Animats4y ago

They recommend you if you're doing multiple GET request, to create a `Client` and then use that to get better performance

Right, then it doesn't have to reopen the connection for each request. That's not an async thing, it's a caching thing.

zaphar4y ago

In heavily IO bound workloads for a compiled language like Rust and Go the bulk of the time will be spent waiting for IO. In that world the optimzations of the compiler for CPU bound operations will fade into the background so it's not suprising that Go is competitive with Rust for that kind of workload. If your workload is this type and Go is equally supported Rust then Go may be a better choice.

masklinn4y ago

Python’s request is exactly the same (the client is called Session). I guess the go client just uses a global connection pool by default?

1 more reply

bruce3434344y ago· 5 in thread

> However, if you are doing web apps or any networking stuff, massive concurrency benefits are almost always too important to ignore

No, you will benefit from parallelism/multithreading. Why only use 1 core? Multitasking as it was once called, or "async" as it is now, is fundamentally _synchronous_ because everything still happens on one core. Just that the order of execution may be a bit wonky, which technically all code already suffers from at the microscopic level with instruction reordering and out of order execution. You almost certainly don't need multitasking unless you are writing an OS for embedded.

solar-ice4y ago

Async in Rust usually runs across many cores, using a work-stealing scheduler, fwiw.

lalaithion4y ago

Even if you use N cores, you still get a massive benefit from being able to let >N threads wait on IO events simultaneously using concurrency/multitasking/async.

bruce3434344y ago

There's only so much a "apache" or "nginx" can do though in between io operations right? And there's only so much io per second a whole system can do. Basically, from disk to memory, maybe run a language interpreter if the site is not static, then from memory to the internet. Maybe if your pages are very dynamic and involve a lot of scripts it could be worthwhile. Do you have any numbers to back up your claim?

1 more reply

bkolobara4y ago

Almost all async Rust runtimes use a multithreaded work stealing schedulers by default, to equally utilise all available cores.

maleldil4y ago

Tokio has a multi-threaded scheduler.

jstx14y ago· 4 in thread

I've done some beginner Rust and Go programming (read "the books" on both, written small programs) and I'm wondering which one to spend more time on or try to get a job with in the future. When I see discussions like this one about Rust, I start to worry that it's unnecessarily complicated and difficult to work with and that this will only get worse in the future to the point that it won't be a good fit for many of the use cases that it's pitched for. Am I wrong to think this?

toolz4y ago

If you're trying to break into the industry you're not going to be working on problems that the language really matters. Pick a popular language, learn enough to be dangerous and specialize once you find categories of problems that interest you. Go and rust are only compared a lot because they're sexy buzzwords - they do not target the same problems and they aren't competing languages. Learning both at some point could prove valuable, but personally I'd never recommend go to anyone for anything anyways.

jstx14y ago

> they do not target the same problems and they aren't competing languages.

Is this really true? All the problems that are solvable in Go should be solvable in Rust too right (but not vice versa because Go is GCed)? They might not compete on every front but there definitely should be overlap in the use cases.

1 more reply

loudmax4y ago

Go is very easy to learn. You can be up and running with Go very quickly and it's fantastic for simple applications. The time investment to be reasonably good at writing Go is low. There's little reason not to learn Go.

Rust is difficult to learn, unless you already have a lot of experience with existing low level languages. Getting complex programs up and running with Rust is cumbersome. But the performance is excellent, you can have a high degree that your program is rock solid, and there are entire classes of security issues don't happen in Rust. For the types of applications where Rust does well, it does very well indeed. The time investment to become a decent Rust programmer is high, but this higher barrier to entry can make your programming skills even more valuable since there's less supply to meet the demand.

hgomersall4y ago

Rust is hard without a doubt. I'm suspicious it's really only hard because it front loads the effort over, say, c++, but it is hard. Expect maybe a year to be proficient, but there's a point long before that it becomes a delight.

Async is hard again, taking more months to feel proficient. I've again a suspicion that much of the resistance to async is due to people who have done the first effort to feel comfortable in rust and expect async to fit right in, but it doesn't, because it's hard too.

Threads are also hard, but under rust they better map to existing thread models, so pre-existing skills are useful and so someone skilled in threads and rust will be skilled in threads with rust.

For sure, there are missing pieces of the async world like async traits, but they will come.

amelius4y ago· 4 in thread

Meanwhile, GoLang allows thousands of threads without problems.

avgcorrection4y ago

Meanwhile, a language with completely different design goals does things in a very different manner.

Stop trying to stir shit.

jedisct14y ago

And servers such as fasthttp have excellent performance.

smilekzs4y ago

FTFY: thousands of GREEN threads

amelius4y ago

Does it matter? The point is that Go has excellent throughput and latency, while using only a single concurrency model.

3 more replies

wongarsu4y ago· 2 in thread

> However, if you are doing web apps or any networking stuff, massive concurrency benefits are almost always too important to ignore

My problem is more that even if I don't need massive concurrency (say in a client that only talks to a single server, in a serial manner), I'm still more or less forced into async code because that's what the ecosystem switched to. No matter if you benefit from async or not, not using it is going against the grain and generally makes your life harder, despite threads being much better from a language-ergonomics point of view

solar-ice4y ago

As much as I agree, and it's a mess: you can very much use the tokio runtime's block_on function to do as little async as possible. Rust is in general a much nicer language, with lots of good tooling, when you pretend async stuff is blocking like that.

ruuda4y ago

There are still good synchronous alternatives, e.g. tiny_http for serving, and just binding libcurl for requests, but I agree it is becoming harder to avoid async.

thecompilr4y ago· 2 in thread

For me async is about ergonomics first of all. When you perform parallel tasks on multiple threads it is hard and ugly (in cross platform Rust at least) to implement any sort of intricate cross communication, as communication between threads is asynchronous by nature. And it is very much impossible to stop a thread externally.

Async rust lets you implement different combinators on async tasks and cancel them effortlessly.

As for performance, tokio is not exactly a zero cost abstraction. Just run perf on a tokio program to see how big of overhead it introduces. It has claimed to be zero cost from the start, and since then it has done at least two major performance overhauls, to prove the point. That being said I love tokio and its ecosystem, but it is ergonomics, not speed that I love. That being said async-std was much slower for the networking use case that I had, so overall tokio is as good as it gets.

WindSoldier4y ago

Why do you say async-std is so much slower? If there any result or report shows that? I'm also curious about the `perf` you mentioned.

thecompilr4y ago

Well, at least it was slower a while ago, for the networking code I was working on. Could be outperforming for other use cases. As for perf, when you run it you can see that a lot of cpu is sent on work stealing and other bookkeeping. Which is fine, and the single threaded runtime doesn’t have some of that, so I use that a lot.

the__alchemist4y ago· 1 in thread

This sounds like what I'm looking for for building a set of networking/pentest tools. Ie, being able to spawn an arbitrary number of IO bound processes without the overhead of OS threads, and the contagion and fracturing of Async.

There may still be some fracturing here, ie in the first example (but not the others, inexplicably?) `lunatic::net` vice `std::net`.

bkolobara4y ago

Hi, author here. All examples should have used `lunatic::net`, I fixed it now.

The reason why we provide `lunatic::net` and you can't just use `std::net` is that WASI (system interface for WebAssembly) still doesn't have support for sockets[0]. `lunatic::net::TcpStream` is for now just a drop in replacement for `std::net::TcpStream` and once sockets get standardised you will be able to use the standard library types instead.

[0]: https://github.com/WebAssembly/WASI/pull/312

brickbrd4y ago· 1 in thread

What does "stream.write_all(&number_as_bytes).unwrap();" do if the socket buffer is full? Does it block this virtual thread running this function? Or does the stream keep buffering? or is it sending the message to some other process which is accumulating those messages. What if I don't wait this thread to block and instead do something else?

I believe all of these are handled. I just cannot find sufficient documentation to understand the details of how this works.

Matthias2474y ago

Same as the synchronous version: It will block until more data can be written, and then go on and write as much as possible using another async .write() call. It's the same as:

    let mut offset = 0;
    while offset != number_as_bytes.length() {
        let written = stream.write(&number_as_bytes[offset..]).await.unwrap();
        offset += written;
    }

The synchronous version would be the same without the .await, and offers stronger guarantees that either all bytes are written to the socket or the socket errored and is dead. The async version could be cancelled in the middle of the invocation after some segments have already been written.

cshenton4y ago

Why isn’t imperative event loop programming more widely used? It’s a reasonably common pattern for games networking libraries like Enet, and has the added bonus that you get to design exactly how you lay out the memory of all your in flight work and therefore have it be easily debuggable.

zokier4y ago

Has anyone seen any recent solid benchmarks of thread per connection architecture web application? What is actually the break-point load where it's perf starts to regress and async really becomes useful?

mamcx4y ago

Exist an alternative to `actix` that can use this model?

Because it sound interesting, but the hard part is that you need a combo of request/webserver to have a chance.

and then the DB side....

beebmam4y ago

I've been asking this for months, and I can't seem to find an answer anywhere:

I'm unable to get debugger breakpoints in Async functions in Rust to actually break.

Is this a known bug with Async Rust? Or is this simply unsupported (yet)? Seems like a really broken experience currently.

ithkuil4y ago

is it possible to use this on a non-wasm target?

j / k navigate · click thread line to collapse

133 comments

73 comments · 16 top-level

zaphar4y ago· 14 in thread

pornel4y ago

[1]: http://venge.net/graydon/talks/intro-talk-2.pdf

zaphar4y ago

I just think that the cultural decision in the wider ecosystem to make, practically speaking, everything io related, async is possibly a mistake.

1 more reply

loeg4y ago

Too many major packages in the ecosystem only support an async model now. It's pretty frustrating if you are just writing a synchronous program, or one with a straightforward OS threading model.

kirbyfan64sos4y ago

2 more replies

dboreham4y ago

3 more replies

U1F9844y ago

There's a neat crate for that I recently found: https://crates.io/crates/pollster

1 more reply

ianbutler4y ago

It's hard to wind down that existing momentum.

api4y ago

3 more replies

hgomersall4y ago

Personally, once I grokked async rust, I found it much easier to use and reason about than threads. Things just seem to map better without any messy stuff to think about.

eloff4y ago

ithrow4y ago

Yes, async is hard. It adds lots of complexity, both to the code and in your mental model. That slows development.

Nodejs devs seem to be doing fine? and I would say their development is faster than most devs working on other stacks. Nodejs is also a top 3 server stack and growing.

1 more reply

daenz4y ago

estebank4y ago

> Power can always be added later with clever hacks, without ruining an ergonomic interface.

[1]: https://fasterthanli.me/articles/working-with-strings-in-rus...

necubi4y ago

Matthias2474y ago· 10 in thread

> The problem is that threads just don’t work in practice for massive concurrency.

gopalv4y ago

> While it might not be the most efficient, it will not perform so bad that it causes an availaiblity impact either.

Performance is important, but the biggest performance gain happens when a program goes from not working to working correctly.

Debugging is another corner case which async makes it intolerably hard to get backtrace and make sense out of what is going on.

I'm a big fan of executor thread-groups and work queues, but damn does it make hard to mentally walk through a bug when the stack traces are scattered across multiple places.

bsder4y ago

> > The problem is that threads just don’t work in practice for massive concurrency.

> That's an assumption that is repeated very often recently, and measured very rarely.

I would go further--there is a whole infrastructure that needs to appear when massive concurrency is involved and very few times is that taken into account.

Side note: Erlang still has the absolute best handling of binary parsing of any language ever. https://www.erlang.org/doc/programming_examples/bit_syntax.h...

rad_gruchalski4y ago

Erlang pattern matching is awesome. Matching on binaries makes it very easy to parse protocols.

Another good one is protoactor for golang. That will also do a million actors no problem. Comes really close to Erlang in terms of how concise the syntax is. But again, no binary matching.

woah4y ago

If you're writing a CRUD app, sure, do it in PHP and spin up a thread per request.

rat99884y ago

Because he is talking about massive concurrency, not embedded or wasm or other contexts where there not be as many resources available.

1 more reply

int_19h4y ago

Embedded is much less likely to need async in the first place at all.

3 more replies

eklitzke4y ago

In the real world here are the kinds of problems that people at Google etc. care about when it comes to performance or scalability issues with hugely concurrent programs:

  - Noisy neighbor problems from other threads messing with your TLB and L1 cache
  - High cost of context switches
  - Unpredictable scheduling/priority inversion in the scheduler

[1] https://abseil.io/docs/cpp/guides/synchronization#thread-ann... [2] https://www.youtube.com/watch?v=KXuZi9aeGTw [3] https://storage.googleapis.com/pub-tools-public-publication-...

ibraheemdev4y ago

> - Noisy neighbor problems from other threads messing with your TLB and L1 cache

Switching between threads within the same process doesn't require a TLB or L1 cache flush. Not sure if you were implying this, just wanted to point that out.

> - High cost of context switches

> Unpredictable scheduling/priority inversion in the scheduler

This can definitely be an issue at scale, but a general purpose async scheduler like most use is unlikely to be any better.

rstuart41334y ago

As another data point, I have one Firefox window right now:

    $ ps -eLf | grep firefox | wc -l
    569
    $

geodel4y ago

Once one go with cultish following of async everything idea, measuring things would be heresy.

verdagon4y ago· 9 in thread

Off the top of my head:

Google discovered a way to switch between OS threads without the syscall overhead. All it needs is to solve the memory overhead. [0]

Zig discovered a way to use monomorphization to enable colorless async/await. If someone could figure out how to make it work through polymorphism / virtual dispatch, that would be amazing. [1]

Vale discovered a possible way to make structured concurrency in a memory safe way that's easier than existing methods. [2]

Cone is designing a unique blend of actors and async await, to enable simpler architectures. [5]

We're close to solving the problem, I can feel it.

[0] No public docs on it, but TL;DR: we tell the OS the thread is blocked, and manually switch over to it by saving/manipulating registers.

[1] https://kristoff.it/blog/zig-colorblind-async-await/

[2] https://verdagon.dev/blog/seamless-fearless-structured-concu...

[3] https://blog.cloudflare.com/how-stacks-are-handled-in-go/

[4] https://youtu.be/NV46KFV1m-4

[5] Can't find the link, but was a discussion on their server.

duped4y ago

hayley-patton4y ago

[0] Stop and Go: Understanding Yieldpoint Behaviour <https://users.cecs.anu.edu.au/~steveb/pubs/papers/yieldpoint...>

gjvnq4y ago

Sounds like Erlang and single assignment languages.

Jokes aside, part of the problem seems to be the computer model and cpu architectures themselves.

We need something that is designed from scratch to run things concurrently.

1 more reply

useerup4y ago

Yes. There is current research into Algebraic Effects (see for instance https://www.microsoft.com/en-us/research/wp-content/uploads/...).

Algebraic Effects promise a return to non-colored functions, as AE can abstract over exceptions, continuations, async and other control-flow mechanisms.

kmeisthax4y ago

[0] Go gets around this with a linked list of stacks, which adds its own overhead.

ghoward4y ago

I'm betting on structured concurrency. I think it will be the same sort of revolution for concurrent programming that structured programming was for single-threaded programming.

Nullabillity4y ago

You won't solve the broken and unusable programming model of threads by trying to emulate the programming model of threads.

xrobledo844y ago

What you are looking for is called Erlang

jjnoakes4y ago

How is that "the ease of goroutines but without GC" ?

lewantmontreal4y ago· 5 in thread

Requesting urls n-at-a-time took me a while (https://play.rust-lang.org/?version=stable&mode=debug&editio...). In particular rust-analyzer itself cannot figure out `buffer`'s type here.

You can consider me very intrigued by Lunatic.

etra04y ago

Sometime ago I was comparing go, python and Rust to do some GET request asynchronous.

Hopefully this comment is somewhat helpful :)

[1] https://docs.rs/reqwest/latest/reqwest/#making-a-get-request

Matthias2474y ago

Always create a client explicitly. And also always add a timeout.

The Go http.Get() function uses a shared global client, so making a request doesn't have high initialization costs, and requests can make sure of a shared connection pool.

Animats4y ago

They recommend you if you're doing multiple GET request, to create a `Client` and then use that to get better performance

Right, then it doesn't have to reopen the connection for each request. That's not an async thing, it's a caching thing.

zaphar4y ago

masklinn4y ago

Python’s request is exactly the same (the client is called Session). I guess the go client just uses a global connection pool by default?

1 more reply

bruce3434344y ago· 5 in thread

> However, if you are doing web apps or any networking stuff, massive concurrency benefits are almost always too important to ignore

solar-ice4y ago

Async in Rust usually runs across many cores, using a work-stealing scheduler, fwiw.

lalaithion4y ago

Even if you use N cores, you still get a massive benefit from being able to let >N threads wait on IO events simultaneously using concurrency/multitasking/async.

bruce3434344y ago

1 more reply

bkolobara4y ago

Almost all async Rust runtimes use a multithreaded work stealing schedulers by default, to equally utilise all available cores.

maleldil4y ago

Tokio has a multi-threaded scheduler.

jstx14y ago· 4 in thread

toolz4y ago

jstx14y ago

> they do not target the same problems and they aren't competing languages.

1 more reply

loudmax4y ago

hgomersall4y ago

Threads are also hard, but under rust they better map to existing thread models, so pre-existing skills are useful and so someone skilled in threads and rust will be skilled in threads with rust.

For sure, there are missing pieces of the async world like async traits, but they will come.

amelius4y ago· 4 in thread

Meanwhile, GoLang allows thousands of threads without problems.

avgcorrection4y ago

Meanwhile, a language with completely different design goals does things in a very different manner.

Stop trying to stir shit.

jedisct14y ago

And servers such as fasthttp have excellent performance.

smilekzs4y ago

FTFY: thousands of GREEN threads

amelius4y ago

Does it matter? The point is that Go has excellent throughput and latency, while using only a single concurrency model.

3 more replies

wongarsu4y ago· 2 in thread

> However, if you are doing web apps or any networking stuff, massive concurrency benefits are almost always too important to ignore

solar-ice4y ago

ruuda4y ago

There are still good synchronous alternatives, e.g. tiny_http for serving, and just binding libcurl for requests, but I agree it is becoming harder to avoid async.

thecompilr4y ago· 2 in thread

Async rust lets you implement different combinators on async tasks and cancel them effortlessly.

WindSoldier4y ago

Why do you say async-std is so much slower? If there any result or report shows that? I'm also curious about the `perf` you mentioned.

thecompilr4y ago

the__alchemist4y ago· 1 in thread

There may still be some fracturing here, ie in the first example (but not the others, inexplicably?) `lunatic::net` vice `std::net`.

bkolobara4y ago

Hi, author here. All examples should have used `lunatic::net`, I fixed it now.

[0]: https://github.com/WebAssembly/WASI/pull/312

brickbrd4y ago· 1 in thread

I believe all of these are handled. I just cannot find sufficient documentation to understand the details of how this works.

Matthias2474y ago

Same as the synchronous version: It will block until more data can be written, and then go on and write as much as possible using another async .write() call. It's the same as:

    let mut offset = 0;
    while offset != number_as_bytes.length() {
        let written = stream.write(&number_as_bytes[offset..]).await.unwrap();
        offset += written;
    }

cshenton4y ago

zokier4y ago

mamcx4y ago

Exist an alternative to `actix` that can use this model?

Because it sound interesting, but the hard part is that you need a combo of request/webserver to have a chance.

and then the DB side....

beebmam4y ago

I've been asking this for months, and I can't seem to find an answer anywhere:

I'm unable to get debugger breakpoints in Async functions in Rust to actually break.

Is this a known bug with Async Rust? Or is this simply unsupported (yet)? Seems like a really broken experience currently.

ithkuil4y ago

is it possible to use this on a non-wasm target?

j / k navigate · click thread line to collapse