The bottleneck was connection establishment in the kernel, and as traffic decreased it became harder to run meaningful tests without great risk (send all the traffic to one server to see if it can handle it is fun, but unsafe), and we had to keep a fair number of servers to keep the IPs from the hosting provider, so we had way more capacity than needed and optimizing more wasn't a good use of time :(