The biggest productivity boost to my rust embedded firmware development was when I could stop manually implementing state machines and marshalling all local variables into custom state after custom state between each I/O operation snd let rust do that for me by using async/await syntax!
That’s, after all, what async desugars to in rust: an automatic state machine that saves values across I/O (await) points for you.
That isn't always the case though. In more packet-oriented usecases (QUIC, WebRTC & IP), doing the actual IO bit is easy: send & receive individual packets / datagrams.
There isn't really much the compiler can generate for you because you don't end up with many `.await` points. At the same time, the state management across all these futures becomes spaghetti code because many of these aspects should run concurrently and thus need to be in their own future / task.
Theoretically, you could do the same thing with async/await constructing the state machines for you, although in practice it's pretty painful and most async/await code is impure.
There are lots of more experimental languages which exceptional support for this style of programming (Eff, Koka, Frank). Underlying all of Haskell's IO discourse is a very deep investment into several breeds of this kind of technology (free monads and their variants).
Lately, Unison has been a really interesting language which explores lots of new concepts but also has at its core an extensible effects system that provides excellent language-level support for this kind of coding.
Here is a simple counter example. Suppose you have to process a packet that contains many sequences (strings/binary blobs) prefixed by 4 bytes of length.
You are not always guaranteed to get the length bytes or the string all in one go. In a sequential system you'd accumulate the string as follows
handle_input(...)
while not received 4 bytes
accumulate in buf
len = toInt(buf[0..4])
while not received len bytes
accumulate in buf
If implemented as a state machine ,these would require two await points to assemble the string. Flattening this out into a state machine manually is a pain.For example, if you’re holding a mutex lock, you probably want to avoid holding it “across” an await point so that it’s not locked for longer than necessary if the function is paused.
> This pattern isn't something that we invented! The Python world even has a dedicated website about it.
And yet it is too common to find protocol libraries doing I/O in the wild :-(
What got me thinking about this was the whole fn coloring discussion, and a happy accident on my part. I had been writing a VT100 library and was doing my head in trying to unit test it. The problem was that I was essentially `parser::new(stdin())`. During the 3rd or 4th rewrite I changed the parser to `parser::push(data)` without really thinking about what I was doing. I then realized that Rust was punishing me for using an enterprise OOPism anti-pattern I have since been calling "encapsulation infatuation." I now see it everywhere (not just in I/O) and the havoc it wreaks.
The irony is that this solution is taught pre-tertiary education (and again early tertiary). The simplest description of a computer is a machine that takes input, processes/transforms data, and produces output. This is relevant to the fn coloring discussion because only input and output need to be concerned with it, and the meat-and-potatoes is usually data transformation.
Again, this is patently obvious - but if you consider the size of the fn coloring "controversy;" we've clearly all been missing/forgetting it because many of us have become hard-wired to start solving problems by encapsulation first (the functional folks probably feel mighty smug at this point).
Rust has seriously been a journey of more unlearning than learning for me. Great pattern, I am going to adopt it.
Edit: code in question: https://codeberg.org/jcdickinson/termkit/src/branch/main/src...
Could you please elaborate more on this? I feel you're talking about an obvious problem with that pattern but I don't see how Rust punishes you for using it (as a very novice Rust learner)
My first few parser designs also took a handler trait (the Alacritty VT100 stuff does this if you want a living example). Because, you know, encapsulate and DI all the things! Async traits weren't a thing at the time (at least without async-trait/boxing/allocation in a hot loop), so fn coloring was a very real problem for me.
The new approach (partial, I haven't started the parser) is:
input.read(&mut data);
tokens = scanner.push(&data);
ops = parser.push(&tokens);
Maybe you can see from that how much simpler it would be to unit test the parser, I can pass it mock token streams instead of bytes. I can also assert the incremental results of it without having to have some mock handler trait impl that remembers what fns were invoked.I'm not sure if that really answers your question, but as I mentioned: it's been a while. And good luck with the learning!
Type parameters and traits everywhere. Structs being (ab-)used as class-like structures that provide functionality.
Rust works better if you avoid type parameters and defining your own traits for as much as possible.
Encapsulation is good if we talk about ensuring invariants are maintained. This blog post about parse, don't validate comes to my mind: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
But besides that it's pretty convenient. Let's say you have a ws_handler channel, you just send your data through that and there is a dedicated handler somewhere that may or may not send that message if it's able to.
My feeling is that sans-IO is particularly useful for libraries, although it can be used for applications too. In a library it means you don't force decisions about how I/O happens on your consumer, making it strictly more useful. This is important for Rust because there's already a bunch of ecosystem fragmentation between sync and async IO(not to mention different async runtimes)
I would go as far as saying that whatever functionality your application provides, there is a core that can be modelled without depending on IO primitives.
But as you say, it comes with problems: Actors / channels can be disconnected for example. You also want to make sure they are bounded otherwise you don't have backpressure. Plus, they require copying so achieving high-throughput may be tricky.
The idea of separating logic from execution is a whole thing, well trodden by the Haskell ecosystem.
[EDIT] Also, they didn't mention how they encapsulated the `tokio::select!` call that shows up when they need to do time-related things -- are they just carrying around a `tokio::Runtime` that they use to make the loop code async without requiring the outside code to be async?
[EDIT2] Maybe they weren't trying to show an encapsulated library doing that, but rather to show that the outside application can use the binding in an async context...
I would have been more interested in seeing how they could implement an encapsulated function in the sans-IO style that had to do something like wait on an action or a timer -- or maybe the answer they're expecting there is just busy-waiting, or carrying your own async runtime instance (that can essentially do the busy waiting for you, with something like block_in_place.
The "encapsulated function" is the `StunBinding` struct. It represents the functionality of a STUN binding. It isn't a single function you can just call, instead it requires an eventloop.
The point though is, that `StunBinding` could live in a library and you would be able to use it in your application by composing it into your program's state machine (assuming you are also structuring it in a sans-IO style).
The linked `snownet` library does exactly this. Its domain is to combine ICE + WireGuard (without doing IO) which is then used by the `connlib` library that composes ACLs on top of it.
Does that make sense?
EDIT: There is no busy-waiting. Instead, `StunBinding` has a function that exposes, what it is waiting for using `poll_timeout`. How the caller (i.e. eventloop) makes that happen is up to them. The appropriate action will happen once `handle_timeout` gets called with the corresponding `Instant`.
What I was thinking was that the functionality being executed in main could just as easily be in a library function -- that's what I meant by encapsulated function, maybe I should have said "encapsulated functionality".
If the thing I want to do is the incredibly common read or timeout pattern, how do I do that in a sans-IO way? This is why I was quite surprised to see the inclusion of tokio::select -- that's not very sans IO, but is absolutely the domain of a random library function that you might want to expose.
It's a bit jarring to introduce the concept as not requiring choices like async vs not, then immediately require the use of async in the event loop (required to drive the state machine to completion).
Or is the point that the event loop should be async? That's a reasonable expectation for event loops that are I/O bound -- it's the whole point of a event loop/reactor pattern. Maybe I'm missing some example where you show an event loop that is not async, to show that you can drive this no matter whether you want or don't want async?
So if I to try to condense & rephrase:
If I want to write a function that listens or times out in sans-IO style, should I use tokio::select? If so, where is the async runtime coming from, and how will the caller of the function be able to avoid caring?
I got half way through this article feeling like this pattern was extremely familiar after spending time down inside rust-libp2p. Seems like that wasn't a coincidence!
Firezone looks amazing, connect all the things!
Yes there are indeed similarities to rust-libp2p! Over there, things are more interleaved though because the actual streams and connections are still within `Future`-like constructs and not strictly split like in the sans-IO case here.
Has anyone tried to combine async and sans-io? At least morally, I ought to be able to write an async function that awaits sans-io-aware helpers, and the whole thing should be able to be compiled down to a state machine inside a struct with a nice sans-io interface that is easily callable by non-async code.
I’ve never tried this, but the main issues I would forsee would be getting decent ergonomics and dealing with Pin.
It's not an issue for a single coroutine whose lifetime is contained within the function that defines it, since the compiler can figure that out and stack-allocate the state machine. But arguably the most useful application of coroutines is as elements in a queue for event loop machinery. But implementing that is made impossible unless you box the coroutines. Vec<Box<dyn Coroutine>> is not a cache friendly data structure, and you'll feel the pain if you're doing extremely high concurrency I/O and need a million elements in your Vec.
To read data, the generator would suspend (.await) and wait to be resumed with incoming data. I am not sure if there is nightly syntax for this but it would have to look something like:
// Made up `gen` syntax: gen(yield_type, resume_type)
gen(Transmit, &[u8]) fn stun_binding(server: SocketAddr) -> SocketAddr {
let req = make_stun_request();
yield Transmit {
server,
payload: req
};
let res = .await; // Made up "suspend and resume with argument"-syntax.
let addr = parse_stun_response(res);
addr
}https://doc.rust-lang.org/stable/std/ops/trait.Coroutine.htm... and the syntax you're looking for for resuming with a value is `let res = yield ...`
Alternatively there is a proc macro crate that transforms generator blocks into async blocks so that they work on stable, which is of course a round-about way of doing it, but it certainly works.
But I've also been mulling over the idea how they could be combined! One thing I've arrived at is the issue that async functions compile into opaque types. That makes it hard / impossible to use the compiler's facility of code-generating the state machine because you can't interact with it once it has been created. This also breaks the borrow-checker in some way.
For example, if I have an async operation with multiple steps (i.e. `await` points) but only one section of those needs a mutable reference to some shared data structure. As soon as I express this using an `async` function, the mutable reference is captured in the generated `Future` type which spans across all steps. As a result, Rust doesn't allow me to run more than one of those concurrently.
Normally, the advice for these situations is "only capture the mutable reference for as short as possible" but in the case of async, I can't do that. And splitting the async function into multiple also gets messy and kind of defeats the point of wanting to express everything in a single function again.
In the context of HTTP/1.1 the async code became a kind of "blueprint" for how the user wants the call to behave. At the time I was dead set on making it work for no_std (non allocator) environment, and I gave up because I couldn't find a way around how to need dynamic dispatch via Box<dyn X> (needing an allocator).
async fn stun(
server: SocketAddr,
mut socket: impl Sink<(BindingRequest, SocketAddr), Error = color_eyre::Report>
+ Stream<Item = Result<(BindingResponse, SocketAddr), color_eyre::Report>>
+ Unpin
+ Send
+ 'static,
) -> Result<SocketAddr> {
socket.send((BindingRequest, server)).await?;
let (message, _server) = socket.next().await.ok_or_eyre("No response")??;
Ok(message.address)
}
Fully working code at https://gist.github.com/joshka/af299be87dbd1f64060e47227b577...Sans I/O mostly means, write pure functions and move I/O out of your main functionality as much as possible. Then you can deal with each part independently and this makes the compiler happy.
80-96% of your work on a sans io rust project is still going to be I/O, but it’s not complected with your main logic, so you can unit test the main logic more easily
Is the similarity you are seeing that the work itself that gets scheduled via a job is agnostic over how it is executed?
From this example [0] it looks more like that async API is very similar to Rust's futures:
- Within a job you can access a "wait context"
- You can suspend on some condition
- You can trigger a wake-up to continue executing
[0]: https://www.openssl.org/docs/man1.1.1/man3/ASYNC_is_capable....
The most interesting thing I learned from the article is that cloudflare runs a public stun server. But even that isn't helpful because the 'good' and 'useful' version of the STUN protocol is the first version of the protocol which supports 'change requests' -- a feature that allows for NAT enumeration. Later versions of the STUN protocol removed that feature thanks to the 'helpful suggestions' of Cisco engineers who contributed to the spec.
The current situation in Rust is that if you implement a library, say one that does WebRTC, that uses the Tokio async runtime. Then it's very cumbersome for folks to use it if they are doing sync IO, using a different runtime(smol, async-std etc), are using iouring directly etc. With this approach you don't force the IO choice on consumers and make the library useful to more people.
- Client and gateway perform ICE to agree on a socket pair (this is where hole-punching happens or if that fails, a relay is used)
- The socket pair determined by ICE is used to set up a WireGuard tunnel (i.e. a noise handshake using ephemeral keys).
- IP traffic is read from the TUN device and sent via the WireGuard tunnel to the gateway.
- Gateway decrypts it and emits it as a packet from its TUN device, thereby forwarding it to the actual destination.
It is worth noting that a WireGuard tunnel in this case is "just" the Noise Protocol [0] layered on top of UDP. This ensures the traffic is end-to-end encrypted.
Am I missing something?
Why would you want this in a client? It's not like a client needs to manage tens of thousands of connections. Unless it's doing a DDOS job.
The main benefit is being able to use `&mut` everywhere: At the time when we read an IP packet from the TUN device, we don't yet know, which gateway (exit node), it needs to go to. We first have to look at the user's policies and then encrypt and send it via a WireGuard tunnel.
Similarly, we need to concurrently receive on all of these tunnels. The tunnels are just a user-space concept though. All we do is receive on the UDP socket and index into the corresponding data structure based on the sending socket.
If all of these "connections" would use their own task and UDP socket, we'd would have to use channels (and thus copying) to dispatch them. Additionally, the policy state would have to be in an `Arc<Mutex>` because it is shared among all connections.
I mean, this is basically what the IO monad and monadic programming in Haskell end up pushing Haskell programmers to do.
- the event loop
- the state machine of data states that occur
But async rust is already a state machine, so the stun binding could be expressed as a 3 line async function that is fairly close to sans-io (if you don't consider relying on abstractions like Stream and Sink to be IO).
async fn stun(
server: SocketAddr,
mut socket: impl Sink<(BindingRequest, SocketAddr), Error = color_eyre::Report>
+ Stream<Item = Result<(BindingResponse, SocketAddr), color_eyre::Report>>
+ Unpin
+ Send
+ 'static,
) -> Result<SocketAddr> {
socket.send((BindingRequest, server)).await?;
let (message, _server) = socket.next().await.ok_or_eyre("No response")??;
Ok(message.address)
}
If you look at how the underlying async primitives are implemented, they look pretty similar to what you;ve implemented. sink.send is just a future for Option<SomeMessage>, a future is just something that can be polled at some later point, which is exactly equivalent to your event loop constructing the StunBinding and then calling poll_transmit to get the next message. And the same goes with the stream.next call, it's the same as setting up a state machine that only proceeds when there is a next item that is being fed to it. The Tokio runtime is your event loop, but just generalized.Restated simply: stun function above returns a future that that combines the same methods you have with a contract about how that interacts with a standard async event loop.
The above is testable without hitting the network. Just construct the test Stream / Sink yourself. It also easily composes to add timeouts etc. To make it work with the network instead pass in a UdpFramed (and implement codecs to convert the messages to / from bytes).
Adding timeout can be either composed from the outside caller if it's a timeout imposed by the application, or inside the function if it's a timeout you want to configure on the call. This can be tested using tokio test-utils and pausing / advancing the time in your tests.
---
The problem with the approach suggested in the article is that it splits the flow (event loop) and logic (statemachine) from places where the flow is the logic (send a stun binding request, get an answer).
Yes, there's arguments to be made about not wanting to use async await, but when you effectively create your own custom copy of async await, just without the syntactic sugar, and without the various benefits (threading, composability, ...), it's worth considering whether you could use async instead.
Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.
In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.
Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.
EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.
(for context in case you didn't see the link in the other comments - https://gist.github.com/joshka/af299be87dbd1f64060e47227b577... is the full working implementation of this). It uses the tokio framed codec approach where message parsing effectively only succeeds once the full message is parsed. For stun that doesn't really seem like a big deal, but I can imagine that for other protos it might be.
> Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.
The stream is a stream of Result<ParsedMessage, ErrorMessage>, that error message is able to represent the invalid message in whatever fashion makes sense for the protocol. The framed approach to this suggests that message framing is more important to handle as a more simple effectively linear state machine of bytes. Nothing is stopping you from treating the smallest unit of the protocol as being a byte, and handling that in the protocol handler though, still just comes down to being a stream transformation from the bytes to a stream of messages or errors..
Additionally, when working with the stream approach, there's usually a way to mutate the codec (e.g. UdpFramed::codec_mut()) which allows protocol level information to inform the codec how to treat the network byte stream if necessary.
> In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.
The Bytes crate handles zero-copy part of this orthogonally. Lifetimes aren't necessary. PoC on this is to just add a bytes field and push the ref to the bytes representing the stun response like:
let bytes = data.split_to(12).freeze();
Ok(Some(BindingResponse { address, bytes }))
> Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?
> EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.
I wonder if you'd be able to expand on an example where these constraints you mentioned above come up. The Stun example is simple enough that it is achievable just with an async state machine. I'd be curious to work through a more detailed example (mostly for interests sake).
In the article you mention the runtime overhead of mutexes / channels. I'm curious what sort of percentage of the full system time this occupies.