Fair point! Yeah that would allow error handling the state machine.
> > Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.
> I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?
Yeah sure! If you are up for it, you can dig into the actual code here: https://github.com/firezone/firezone/tree/main/rust/connlib/...
The requirements are:
1. Run N clients of the TURN protocol.
2. Run M connections (ICE agent + WireGuard tunnel).
3. Allow each connection to use each TURN client.
4. Everything must run over a single UDP socket (otherwise hole-punching doesn't work).
In theory, (3) could be relaxed a bit to not be a NxM matrix. It doesn't really change much though because one TURN allocation (i.e. remote address) needs to be usable by multiple connections.All of the above is managed as a single `Node` that is composed into a larger state machine that manages ACLs and uses this library to route IP packets to and from a TUN device. That is where the concurrent access comes in: We receive ACL updates via a WebSocket and change the routing logic as a result. Concurrently, we need to read from the TUN device and work out which WireGuard tunnel the packets needs to go through. A WireGuard tunnel may use one of the TURN clients to relay the packet in case a direct connection isn't possible. Lastly, we also need to read from the UDP socket, index into the correct connection based on the IP and decrypt incoming IP packets.
In summary, there are lots of state updates happening from multiple events and the state cannot be isolated into a single task where `&mut` would be an option. Instead, we need `&mut` access to pretty much everything upon each UDP packet from the network or IP packet from the TUN device.
> The Stun example is simple enough that it is achievable just with an async state machine.
I am fully aware of that. It needed to be simple to fit into the scope of a blog post that can be understood by people without a deep networking background. To explain sans-IO, I deliberately wanted a simple example so I could focus on sans-IO itself and not explaining the problem domain. With the basics in place, people can work out by themselves whether or not it is a good fit for their problem. Chances are, if they are struggling with lots of shared state between tasks, it could be!