Honestly, if I had to pin it on just one thing, I'd blame networking everything. Cloud would fit as a subset of that. Networking slows things down at the best of times, and the latency distribution can be a nightmare at the worst. Few developers think about the cost of using the network, and even fewer can think about it holistically (e.g., to avoid making 50 network transactions spread throughout the system when you could do it all in one transaction if you rearranged things).
Developers do not seem to realise how slow the network is compared to everything else.
Sure, 100gbit network itnerfaces do exist, but most servers are attached with 10gbit interfaces, and most of the actual implementations will not actually manage to hit something like 10gbit/s because of latency and window scaling.
You cannot escape latency (without inventing another universe in which physics do not apply). And latency is detrimental to performance.
Getting anything across a large enough network under 1millisecond is hard, and compared to a IOP on a local NVME disk, it is painfully slow.
If you look at a network throughput graph from a packet capture, it looks like a sawtooth pattern. This is called slow start, and its a key feature of TCP and all similar protocols.
So if a server A wants to talk to a server B, it sends 8 packets, waits for a response, then sends 16 packets, waits, 24 packets, waits, and so on until a response is dropped. It then resets to 8 packets. There are lots of variations on this algorithm, such as using a "cubic" curve instead of a linear curve, but the end result is pretty much the same.
Even on an infinite bandwidth link, sending a small blob of JSON -- say 200 kilobytes -- will take pretty much the same time as it would on a 1 Gbps link!
As a side effect of this, anything that reduces latency can have a dramatic effect on effective bandwidth. I've seen some applications triple in speed simply because I enabled "Accelerated Networking" in Azure and used a Proximity Placement Group.
> Even on an infinite bandwidth link, sending a small blob of JSON -- say 200 kilobytes -- will take pretty much the same time as it would on a 1 Gbps link!
Technically that’s incorrect- that will take rtt+200kb/rate assuming your window is over 200kb. So depending on how large rtt is bw component may or may not be significant
This. So few people distinguish between bandwidth and latency. One can be increased arbitrarily and fairly easily with new encoding techniques (which generally only improves edge cases), and the other has a floor that is hard-coded into our universe. I've gotten into debates with folks who think a 10GB connection from the EU to Texas should be as fast as a connection from Texas to the Midwest, or to speed up the EU-TX connection they just need to spend more on bandwidth.
and that is even before you take into consideration network topology
The dizzying amount of NAT layers has to be killing performance. I haven't had the chance to ever sit down and unravel a system running a good load. The lack of TCP tuning combined with the required connection tracking is interesting to think about
Calico seems to be doing it semi-okeish, and even their the control plane is kind of unfinished?
The only software based solution which seem to properly have this figured out is VMware NSX-T. (i am not counting all the traditional overlay networks in use by ISP's based on MPLS/BGP).
Azure Load Balancers and their software defined network use packet header rewriting at the host level to bypass the need for the traffic to physically traverse a load balancer appliance or a NAT appliance. They're generally rewritten when they arrive to the host hypervisor. This is done in hardware via an FPGA inline with the NICs. (This requires "Accelerated Networking" to be enabled, but that's the default in v4 VMs and required for v5 VMs.)
I'm not certain, but I believe AWS does something similar for their VMs. (Their marketing material mentions that they use a custom ASIC instead of an FPGA like Azure.)
With Azure Kubernetes Service (AKS), you can use the Azure CNI, which gives each Pod a unique IP address on the Azure Virtual Network. I can't confirm, but I'm reasonably certain that this means that Pod-to-Pod traffic is direct, with no NAT appliance or software in the way. Essentially the host NICs do the address translation inline at line rate and essentially zero latency.
However, PaaS platforms like Azure App Service or Azure SQL Database are very bad in comparison. They proxy and tunnel and NAT, all in software. I've seen latencies north of 7 milliseconds within a region!
So we have 3 layers from container to pod. The virtual host kernel is tracking those layers. Once connection to one container is 3 tracked connections. Then you have whatever else you put on top to go in and out of the internet.
The funny think to me is HaProxy recommended getting rid of connection tracking for performance while everyone is doubling down on that alone and calling it performant.