Issue 87 – google-compute-engine – UDP Packet Fragments cannot be reassembled (opens in new tab)

(code.google.com)

61 pointssunsu10y ago65 comments

65 comments

38 comments · 10 top-level

ajross10y ago· 12 in thread

Broadly speaking, UDP applications which rely on IP packet fragmentation are broken as designed. If you wanted reliable transport you would have used TCP or a higher level abstraction. If you wanted simple transport of large data chunks, you would have chosen likewise. You picked UDP because you have latency requirements that cannot be met by TCP, and that means you need to know what your packets are actually doing, and that includes fragmentation.

If you want to play in that world you need to be prepared to handle MTU discovery on your own, or else design your app around deliberately small packet sizes.

That's not to say that this isn't a bug. But let's not start editorializing our titles: the apps for which GCE is "unusable" are buggy apps to start with.

cbsmith10y ago

> If you wanted simple transport of large data chunks, you would have chosen likewise.

That's not entirely fair. UDP is actually a very convenient transport for packetized data of 65,000 bytes or less.

> You picked UDP because you have latency requirements that cannot be met by TCP.

That is not really a good reason to use UDP, as you won't necessarily get better latency (as has oft been demonstrated).

You should pick UDP because you have a connectionless protocol and/or one with data loss requirements that are different from those built in to TCP.

> that means you need to know what your packets are actually doing, and that includes fragmentation.

UDP is supposed to abstract IP fragmentation issues for you.

> If you want to play in that world you need to be prepared to handle MTU discovery on your own, or else design your app around deliberately small packet sizes.

I've used UDP based protocols all the time. I'd agree that I've had to be a bit more aware of path MTU when it comes to diagnosing or debugging problems, but in terms of the application code reading and writing UDP packets, I've never done MTU discovery.

> But let's not start editorializing our titles: the apps for which GCE is "unusable" are buggy apps to start with.

Considering there are bytes in the UDP header specifically designed to allow you send packets larger than MTU size, this is actually a huge problem. For many people, this requires reimplementing UDP's fragmentation & defragmentation logic up in layer-7.

Isn't it bad enough that people are reimplementing TCP on top of UDP? Do we really need to tell them to reimplement UDP on top of UDP?

sunsuOP10y ago

While I (sort of) agree with the basic premise of what you're saying, in the real world, its not very helpful.

We didn't "pick" UDP. We operate a VoIP related service that interoperates with many different carriers via SIP. Almost all of those carriers ONLY use UDP. SIP UDP packets can often be fairly large. This is especially problematic because GCE uses a non standard MTU size (1460 bytes). This does not make our app, or every other SIP related app that is forced to use UDP, "Buggy".

jsolson10y ago

Upon further consideration, I've deleted the bulk of this comment.

Rough summary of what I had here: I'm an engineer on GCE (in particular I built our current virtio-net device and a small fraction of the other fiddly bits that sit behind that) -- some details in the bug jumped out at me and I thought there might be a quick fix, but I hadn't processed all of the details and posted a bit prematurely. After further review my original post was essentially content free other than 'IP fragmentation works correctly between internal IPs', which is not germane to the actual customer-reported issue.

1 more reply

kentonv10y ago

I'm not sure if I'd go so far as to say that apps which rely on UDP fragment reassembly are "buggy", but I definitely agree that the article title is exaggerating by calling UDP on GCE "unusable". Many UDP services (including one operated by my company, on GCE) will work just fine.

Interestingly Linux will actually, by default, throw EMSGSIZE any time you try to send() a UDP datagram that is larger than the detected network MTU to the destination. As I understand it, you have to explicitly turn this behavior off to get fragmentation.

http://man7.org/linux/man-pages/man7/udp.7.html

zurn10y ago

Unfortunately this relies on PMTU discovery working, which is frequently broken by misconfigured firewalls by eg blocking all ICMP traffic. Now that GCE has a more exotic broken firewall that also breaks traffic with PMTU disabled, you have a lose-lose situation.

ars10y ago

> I'm not sure if I'd go so far as to say that apps which rely on UDP fragment reassembly are "buggy"

Isn't fragment reassembly essentially implying all the packets (fragments) will arrive, and will be rearranged in order? i.e. exactly what TCP does?

For example imagine sending a single 1MB packet via UDP, letting it fragment and be reassembled. What distinguishes that from TCP?

3 more replies

jws10y ago

No. It's broken. The title should be longer and include "which fragmentation".

Large UDP packets have a packets success rate of the single packet success rate raised to the number of fragments power, so don't get attached to them in situations with significant packet loss. But this behavior is just an unaddressed bug.

cbsmith10y ago

Yeah, exactly. When "higher chance of being dropped" is replaced with "guarantee they'll be dropped", you kind of have a huge problem.

Dylan1680710y ago

> If you wanted reliable transport you would have used TCP or a higher level abstraction.

Nobody asked for that, just to send slightly larger packets that will probably get through.

> If you wanted simple transport of large data chunks, you would have chosen likewise.

Why? A 4KB data chunk doesn't need TCP more than a 1KB data chunk.

You know the 1500 byte MTU is completely arbitrary and many networks set it higher, right?

> You picked UDP because you have latency requirements that cannot be met by TCP, and that means you need to know what your packets are actually doing, and that includes fragmentation.

Fragmentation itself is a cost of microseconds. The main problem with TCP is head of line blocking, and moderately large UDP packets don't have to deal with that.

You don't need to trace every packet all the way through the network to get low latency and manually handle loss.

> the apps for which GCE is "unusable" are buggy apps to start with

Nothing you listed is a bug.

chetanahuja10y ago

That this comment is on top of this thread is saddening. You just called about a dozen broadly used protocols that depend on nothing more than the basic guarantee provided by UDP (unreliable datagram delivery-non/delivery) "broken by design".

If I was Linus, now would be my cue to unleash a tirade about how you don't fing break userspace. Ever. Protocol layering works because higher level protocols depend on long standing contracts with the lower layers. Google decided to willy nilly go and break the basic contract of UDP across their entire cloud and the top comment here faults protocols built on those guarantees. Disappointing.

cbsmith10y ago

I kind of wrote a lot but didn't summarize very well.

A key principle of UDP is quite the contrary of what you are saying. UDP applications can have guarantees that they don't have to engage in MTU discovery or fragmentation issues. It provides a way to have an abstract, static contract about packets that is agnostic to layer-2.

Because UDP is comparatively simple, it has been abused as a proxy to implement your own protocol on top of IP, and in that context you of course you really have to deal with al those concerns. However, it is a terrible mistake to think that is what UDP is about or how one should use UDP.

wmf10y ago

I agree with this, but unfortunately one such broken protocol is IKE, widely used for VPNs.

(Off topic: has anyone tried grafting a TLS handshake onto ESP?)

kerr2310y ago· 8 in thread

AWS de-prioritizes UDP packets making it not a great choice for UDP based applications as well.

akent10y ago

Do you have a reference or a link for this?

kerr2310y ago

AWS is always super closed mouthed about their infrastructure.

The info I have came from a conversation with an AWS Solutions Architect.

It's a couple of years old, but based on my experience it's still true today.

cbsmith10y ago

Yeah, I've run in to this. The point about NAT makes me wonder though if it is really de-prioritization or just the network straining to handle all that recalculation of checksums.

api10y ago

It's legal to send UDP packets with a zero checksum, indicating "no checksum." This can be set at a UDP socket level in Linux. I wonder if that would make any difference?

(Of course this assumes your protocol has some alternate method of verifying transferred data, which many do.)

1 more reply

majke10y ago

That doesn't sound right. Consider DNS.

dfc10y ago

Whenever people talk about behavior/treatment of UDP traffic I consider DNS as a special case. I have no idea how AWS handles UDP but I will never use DNS as a generalizable example of UDP traffic.

cbsmith10y ago

At least in the past, it really was true. I've been burned by this before using UDP in AWS. In AWS, I've learned to be skeptical of using protocols other than TCP. That said, it consequently has been a long time since I've tested UDP over AWS.

kerr2310y ago

DNS was precisely the area where we got bit by it.

We were using a Non-AWS DNS resolver (aka Google) and we would often get dns resolution errors despite our NAT not being remotely taxed by the traffic.

oofabz10y ago· 3 in thread

It sounds like UDP packets that fit within the MTU work fine. If you need to transmit more than fits in one packet (1452 bytes), UDP is a bad choice.

SCTP is ideal for this use case but it is not well supported by OSs or networking APIs. TCP works but adds overhead. TFTP works, is UDP-only and has less overhead than TCP, but it does not respond well to packet loss. UDT is like TFTP done right, and is a good solution if you can setup a dependency on its large C++ library.

addingnumbers10y ago

> If you need to transmit more than fits in one packet (1452 bytes)

You must never make assumptions about what fits in one packet. The MTU could be 100, or less, or 8000, or more.

As soon as you start doing math based on MTU values that you don't permanently have end-to-end control of yourself, you're setting yourself up for trouble.

oofabz10y ago

That's true, it's a bad idea to assume an MTU of 1500. Although there is no minimum MTU in IPv4, IPv6 specifies a minimum of 1280 bytes. So if you send your UDP packets over IPv6, you are guaranteed room for 1232 bytes of payload.

cbsmith10y ago

SCTP is a pretty good choice. It has been a bit since I used it, but it is a more complex protocol and I recall often run in to challenges getting it to perform as efficiently over various bits of network equipment.

Last I checked UDT was far more complex than UDP, and since it is layered on top of UDP, I'd think it'd be vulnerable to this problem (although it had all kinds of logic for correctly sizing packets and windows, so maybe it correctly avoids this problem). Either way though, from an application perspective UDT looks much more like TCP than UDP, so I wouldn't think it'd be an obvious choice to replace UDP.

dsl10y ago· 2 in thread

The last update really hits the nail on the head for most Google products:

> Apparently the update from Google is to find another support channel to escalate, or only use TCP.

I've never had a Google product issue ever resolved using an official channel I was directed to. It's only by back channels, friends of friends, posting to HN, etc.

bad_user10y ago

I've had a couple of issues with Google Apps and their support has been very helpful, getting a phone call from them and the issue solved in the same day. For example I got them to revert me from annual to the flexible plan by simply asking nicely and I even got them to operate settings that aren't normally available, like changing the primary domain. Nowadays I've been moving off Google Apps, but let me tell you, it's the support that I'll miss ;-) Also back in the day when I was working on integrating with AdX, their reply was very slow, but they did reply and they did help us with our integration.

I do not have experience with GCE, but saying that you don't get support for any Google product is disingenuous.

1 more reply

frakkingcylons10y ago

My experience has been quite different. I signed up for the Cloud Platform free trial a couple of months ago and they have responded to me the same day at the latest when I have issues. Hell, they even got back to me on the weekend.

eloff10y ago· 2 in thread

Wow, reported May 2014, and still no resolution for such a serious issue. GCE looks nice in theory, but I've heard no end of problems like this with shitty communication and support when something doesn't work. I'll stick with AWS for the time being. I really wish Google would get their act together and provide serious competition though. That's good for everybody who uses the cloud.

toomuchtodo10y ago

AWS still doesn't support IPv6 except on internet facing ELBs. Pick your poison.

https://forums.aws.amazon.com/thread.jspa?messageID=536049

https://www.reddit.com/r/aws/comments/3ccn5o/real_ipv6_suppo...

api10y ago

There are loads of good cloud providers with IPv6.

2 more replies

halayli10y ago· 1 in thread

I feel the title should be updated to: "for Applications Which Reply on UDP packets larger than 1500 bytes".

fabulist10y ago

Any application might very reasonably choose to do this. Maybe version X never does, and you can satisfy yourself of this by peaking at the code, but there is no reason to believe that X+1 won't, or that another vendor's product/FOSS project you need to interoperate with won't, etc.

dang10y ago

Please don't editorialize the titles of articles you submit to HN.

The submitted title was "Warning: Google Compute Engine Unusable for Applications Which Rely on UDP", which several commenters have objected to as exaggerated.

dboreham10y ago

This is certainly not the only case of "network subtly broken on cloud VMs". For example, every provider I have tested (including AWS, Rackspace, Digital Ocean, Linode) enables TCP segment reassembly offload, and provides no way to disable it (presumably because it is being done on the host not the VM). This will typically break TCP tunneling (e.g. using GRE) because PMTUD doesn't work under these conditions. fwiw a shout out to Soft Layer which is the only VM hosting provider I'm aware of that does not suffer from this blight (provided you pay for additional IP addresses routed to your box).

kaa210210y ago

I've run into several misfires while using Google Cloud/Compute Engine: MySQL database access, email and encryption. These features didn't work without either a Google or third-party service. I set up postfix and use SendGrid for email and Google's Cloud SQL. You can get tech support at Silver or Gold level. I think everyone starts at Bronze.

api10y ago

Wait... you mean there are protocols other than http? Somebody should tell Google, Amazon, and Microsoft.

j / k navigate · click thread line to collapse