My only question is how they got to haproxy as the root cause so quickly. In my experience, comparing what's different between production and staging is a long shot because there's so much that's different. Obviously workload can matter a lot, and so can uptime and time since upgrade. So I'm curious if haproxy was the first thing they saw that was different or if they just didn't write about the dead ends.
All too often, we think of staging as nothing more than a clone of the development environment on a remote server where QA can get at it. If you run your load balancer (haproxy), web server (apache, nginx), database (mysql, postgres, mondo, elasticsearch), caching (memcached, redis) all on a single server for staging, you're eventually going to stumble on this kind of hard-to-diagnose problem.
One of the main disservices you are doing to yourself with a single-server staging is that all of your traffic is going to be travelling over localhost or, worse, over unix sockets. You're not even testing basic network latency or performance.
Staging should really be considered a first-level production. Its configuration and maintenance should be handled with the same attention to detail as production.
The main thing that narrowed it down was thinking about the request path from the app server to Elasticsearch. That pointed us at the load balancers (we tried a request directly from app -> Elasticsearch in production to verify they were the problem).
You're right that HAProxy wasn't the first place we looked once we were there, though. If I remember right, we started by diffing `sysctl` output to see if Ubuntu had tweaked something between versions.
If you don't have processes and documentation to quickly point out those kinds of differences between environments, you're doing something wrong.
[1] For an example, see our recent outage related having seen 200M PostgreSQL transaction: https://www.joyent.com/blog/manta-postmortem-7-27-2015
Where do you think you are, on reddit?
I think rather than a consequence of Nagle's algorithm it is the situation that the algorithm is intended to optimize when an app generates many small packets.