undefined | Better HN

0 pointsjoshka1y ago0 comments

> This isn't an adequat comparison in my eyes. Your example moves a very critical component out if the "protocol logic": message parsing.

(for context in case you didn't see the link in the other comments - https://gist.github.com/joshka/af299be87dbd1f64060e47227b577... is the full working implementation of this). It uses the tokio framed codec approach where message parsing effectively only succeeds once the full message is parsed. For stun that doesn't really seem like a big deal, but I can imagine that for other protos it might be.

> Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.

The stream is a stream of Result<ParsedMessage, ErrorMessage>, that error message is able to represent the invalid message in whatever fashion makes sense for the protocol. The framed approach to this suggests that message framing is more important to handle as a more simple effectively linear state machine of bytes. Nothing is stopping you from treating the smallest unit of the protocol as being a byte, and handling that in the protocol handler though, still just comes down to being a stream transformation from the bytes to a stream of messages or errors..

Additionally, when working with the stream approach, there's usually a way to mutate the codec (e.g. UdpFramed::codec_mut()) which allows protocol level information to inform the codec how to treat the network byte stream if necessary.

> In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.

The Bytes crate handles zero-copy part of this orthogonally. Lifetimes aren't necessary. PoC on this is to just add a bytes field and push the ref to the bytes representing the stun response like:

    let bytes = data.split_to(12).freeze();
    Ok(Some(BindingResponse { address, bytes }))

> Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.

I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?

> EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.

I wonder if you'd be able to expand on an example where these constraints you mentioned above come up. The Stun example is simple enough that it is achievable just with an async state machine. I'd be curious to work through a more detailed example (mostly for interests sake).

In the article you mention the runtime overhead of mutexes / channels. I'm curious what sort of percentage of the full system time this occupies.

0 comments

wh33zle1y ago

> The stream is a stream of Result<ParsedMessage, ErrorMessage>

Fair point! Yeah that would allow error handling the state machine.

> > Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.

> I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?

Yeah sure! If you are up for it, you can dig into the actual code here: https://github.com/firezone/firezone/tree/main/rust/connlib/...

The requirements are:

    1. Run N clients of the TURN protocol.
    2. Run M connections (ICE agent + WireGuard tunnel).
    3. Allow each connection to use each TURN client.
    4. Everything must run over a single UDP socket (otherwise hole-punching doesn't work).

In theory, (3) could be relaxed a bit to not be a NxM matrix. It doesn't really change much though because one TURN allocation (i.e. remote address) needs to be usable by multiple connections.

All of the above is managed as a single `Node` that is composed into a larger state machine that manages ACLs and uses this library to route IP packets to and from a TUN device. That is where the concurrent access comes in: We receive ACL updates via a WebSocket and change the routing logic as a result. Concurrently, we need to read from the TUN device and work out which WireGuard tunnel the packets needs to go through. A WireGuard tunnel may use one of the TURN clients to relay the packet in case a direct connection isn't possible. Lastly, we also need to read from the UDP socket, index into the correct connection based on the IP and decrypt incoming IP packets.

In summary, there are lots of state updates happening from multiple events and the state cannot be isolated into a single task where `&mut` would be an option. Instead, we need `&mut` access to pretty much everything upon each UDP packet from the network or IP packet from the TUN device.

> The Stun example is simple enough that it is achievable just with an async state machine.

I am fully aware of that. It needed to be simple to fit into the scope of a blog post that can be understood by people without a deep networking background. To explain sans-IO, I deliberately wanted a simple example so I could focus on sans-IO itself and not explaining the problem domain. With the basics in place, people can work out by themselves whether or not it is a good fit for their problem. Chances are, if they are struggling with lots of shared state between tasks, it could be!

joshkaOP1y ago

Got it - thanks for the update. I'll take a run at it sometime - if only to understand where the pain points in doing this are more thoroughly.

wh33zle1y ago

Thinking about your example again made me think of another requirement: Multiplexing.

STUN ties requests and responses together with a transaction ID that is encoded as an attribute in the message.

Modelling this using `Sink`s and `Stream`s is quite difficult. Sending out the request via a `Sink` is easy but how do you correctly dispatch the incoming response again?

For example, let's say you are running 3 TURN clients concurrently in 3 async tasks that use the referenced `Sink` + `Stream` abstractions. How do you go from reading from a single UDP socket to 3 streams that each contain the correct responses based on the transaction ID that was sent out? To read the transaction ID, you'd have to parse the message. We've established that we can do that partially outside and only send the `Result` in. But in addition to the 3 tasks, we'd need an orchestrating task that keeps a temporary state of sent transaction IDs (received via one of the `Sink`s) and then picks the correct `Stream` for the response.

It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.

1 more reply

j / k navigate · click thread line to collapse

0 pointsjoshka1y ago0 comments

> This isn't an adequat comparison in my eyes. Your example moves a very critical component out if the "protocol logic": message parsing.

> Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.

The Bytes crate handles zero-copy part of this orthogonally. Lifetimes aren't necessary. PoC on this is to just add a bytes field and push the ref to the bytes representing the stun response like:

    let bytes = data.split_to(12).freeze();
    Ok(Some(BindingResponse { address, bytes }))

In the article you mention the runtime overhead of mutexes / channels. I'm curious what sort of percentage of the full system time this occupies.

0 comments

wh33zle1y ago

> The stream is a stream of Result<ParsedMessage, ErrorMessage>

Fair point! Yeah that would allow error handling the state machine.

Yeah sure! If you are up for it, you can dig into the actual code here: https://github.com/firezone/firezone/tree/main/rust/connlib/...

The requirements are:

    1. Run N clients of the TURN protocol.
    2. Run M connections (ICE agent + WireGuard tunnel).
    3. Allow each connection to use each TURN client.
    4. Everything must run over a single UDP socket (otherwise hole-punching doesn't work).

In theory, (3) could be relaxed a bit to not be a NxM matrix. It doesn't really change much though because one TURN allocation (i.e. remote address) needs to be usable by multiple connections.

> The Stun example is simple enough that it is achievable just with an async state machine.

joshkaOP1y ago

Got it - thanks for the update. I'll take a run at it sometime - if only to understand where the pain points in doing this are more thoroughly.

wh33zle1y ago

Thinking about your example again made me think of another requirement: Multiplexing.

STUN ties requests and responses together with a transaction ID that is encoded as an attribute in the message.

Modelling this using `Sink`s and `Stream`s is quite difficult. Sending out the request via a `Sink` is easy but how do you correctly dispatch the incoming response again?

1 more reply

j / k navigate · click thread line to collapse