If you want to play in that world you need to be prepared to handle MTU discovery on your own, or else design your app around deliberately small packet sizes.
That's not to say that this isn't a bug. But let's not start editorializing our titles: the apps for which GCE is "unusable" are buggy apps to start with.
That's not entirely fair. UDP is actually a very convenient transport for packetized data of 65,000 bytes or less.
> You picked UDP because you have latency requirements that cannot be met by TCP.
That is not really a good reason to use UDP, as you won't necessarily get better latency (as has oft been demonstrated).
You should pick UDP because you have a connectionless protocol and/or one with data loss requirements that are different from those built in to TCP.
> that means you need to know what your packets are actually doing, and that includes fragmentation.
UDP is supposed to abstract IP fragmentation issues for you.
> If you want to play in that world you need to be prepared to handle MTU discovery on your own, or else design your app around deliberately small packet sizes.
I've used UDP based protocols all the time. I'd agree that I've had to be a bit more aware of path MTU when it comes to diagnosing or debugging problems, but in terms of the application code reading and writing UDP packets, I've never done MTU discovery.
> But let's not start editorializing our titles: the apps for which GCE is "unusable" are buggy apps to start with.
Considering there are bytes in the UDP header specifically designed to allow you send packets larger than MTU size, this is actually a huge problem. For many people, this requires reimplementing UDP's fragmentation & defragmentation logic up in layer-7.
Isn't it bad enough that people are reimplementing TCP on top of UDP? Do we really need to tell them to reimplement UDP on top of UDP?
We didn't "pick" UDP. We operate a VoIP related service that interoperates with many different carriers via SIP. Almost all of those carriers ONLY use UDP. SIP UDP packets can often be fairly large. This is especially problematic because GCE uses a non standard MTU size (1460 bytes). This does not make our app, or every other SIP related app that is forced to use UDP, "Buggy".
Rough summary of what I had here: I'm an engineer on GCE (in particular I built our current virtio-net device and a small fraction of the other fiddly bits that sit behind that) -- some details in the bug jumped out at me and I thought there might be a quick fix, but I hadn't processed all of the details and posted a bit prematurely. After further review my original post was essentially content free other than 'IP fragmentation works correctly between internal IPs', which is not germane to the actual customer-reported issue.
Interestingly Linux will actually, by default, throw EMSGSIZE any time you try to send() a UDP datagram that is larger than the detected network MTU to the destination. As I understand it, you have to explicitly turn this behavior off to get fragmentation.
Isn't fragment reassembly essentially implying all the packets (fragments) will arrive, and will be rearranged in order? i.e. exactly what TCP does?
For example imagine sending a single 1MB packet via UDP, letting it fragment and be reassembled. What distinguishes that from TCP?
Large UDP packets have a packets success rate of the single packet success rate raised to the number of fragments power, so don't get attached to them in situations with significant packet loss. But this behavior is just an unaddressed bug.
Nobody asked for that, just to send slightly larger packets that will probably get through.
> If you wanted simple transport of large data chunks, you would have chosen likewise.
Why? A 4KB data chunk doesn't need TCP more than a 1KB data chunk.
You know the 1500 byte MTU is completely arbitrary and many networks set it higher, right?
> You picked UDP because you have latency requirements that cannot be met by TCP, and that means you need to know what your packets are actually doing, and that includes fragmentation.
Fragmentation itself is a cost of microseconds. The main problem with TCP is head of line blocking, and moderately large UDP packets don't have to deal with that.
You don't need to trace every packet all the way through the network to get low latency and manually handle loss.
> the apps for which GCE is "unusable" are buggy apps to start with
Nothing you listed is a bug.
If I was Linus, now would be my cue to unleash a tirade about how you don't fing break userspace. Ever. Protocol layering works because higher level protocols depend on long standing contracts with the lower layers. Google decided to willy nilly go and break the basic contract of UDP across their entire cloud and the top comment here faults protocols built on those guarantees. Disappointing.
A key principle of UDP is quite the contrary of what you are saying. UDP applications can have guarantees that they don't have to engage in MTU discovery or fragmentation issues. It provides a way to have an abstract, static contract about packets that is agnostic to layer-2.
Because UDP is comparatively simple, it has been abused as a proxy to implement your own protocol on top of IP, and in that context you of course you really have to deal with al those concerns. However, it is a terrible mistake to think that is what UDP is about or how one should use UDP.
(Off topic: has anyone tried grafting a TLS handshake onto ESP?)
The info I have came from a conversation with an AWS Solutions Architect.
It's a couple of years old, but based on my experience it's still true today.
(Of course this assumes your protocol has some alternate method of verifying transferred data, which many do.)
We were using a Non-AWS DNS resolver (aka Google) and we would often get dns resolution errors despite our NAT not being remotely taxed by the traffic.
SCTP is ideal for this use case but it is not well supported by OSs or networking APIs. TCP works but adds overhead. TFTP works, is UDP-only and has less overhead than TCP, but it does not respond well to packet loss. UDT is like TFTP done right, and is a good solution if you can setup a dependency on its large C++ library.
You must never make assumptions about what fits in one packet. The MTU could be 100, or less, or 8000, or more.
As soon as you start doing math based on MTU values that you don't permanently have end-to-end control of yourself, you're setting yourself up for trouble.
Last I checked UDT was far more complex than UDP, and since it is layered on top of UDP, I'd think it'd be vulnerable to this problem (although it had all kinds of logic for correctly sizing packets and windows, so maybe it correctly avoids this problem). Either way though, from an application perspective UDT looks much more like TCP than UDP, so I wouldn't think it'd be an obvious choice to replace UDP.
> Apparently the update from Google is to find another support channel to escalate, or only use TCP.
I've never had a Google product issue ever resolved using an official channel I was directed to. It's only by back channels, friends of friends, posting to HN, etc.
I do not have experience with GCE, but saying that you don't get support for any Google product is disingenuous.
https://forums.aws.amazon.com/thread.jspa?messageID=536049
https://www.reddit.com/r/aws/comments/3ccn5o/real_ipv6_suppo...
The submitted title was "Warning: Google Compute Engine Unusable for Applications Which Rely on UDP", which several commenters have objected to as exaggerated.