undefined | Better HN

0 pointsjerf3y ago0 comments

I can't blame it on "cloud", though it's not helping that there are an awful lot of cloud services that claim to be "high performance" and are often mediumish at best. But in general I see a lot of ignorance in the developer community as to how fast things should be able to run, even in terms of reading local files and doing local manipulations with no "cloud" in sight.

Honestly, if I had to pin it on just one thing, I'd blame networking everything. Cloud would fit as a subset of that. Networking slows things down at the best of times, and the latency distribution can be a nightmare at the worst. Few developers think about the cost of using the network, and even fewer can think about it holistically (e.g., to avoid making 50 network transactions spread throughout the system when you could do it all in one transaction if you rearranged things).

0 comments

15 comments · 2 top-level

kazen443y ago· 10 in thread

> Few developers think about the cost of using the network.

Developers do not seem to realise how slow the network is compared to everything else.

Sure, 100gbit network itnerfaces do exist, but most servers are attached with 10gbit interfaces, and most of the actual implementations will not actually manage to hit something like 10gbit/s because of latency and window scaling.

You cannot escape latency (without inventing another universe in which physics do not apply). And latency is detrimental to performance.

Getting anything across a large enough network under 1millisecond is hard, and compared to a IOP on a local NVME disk, it is painfully slow.

jiggawatts3y ago

It wouldn't matter if the links were 10,000 terabits! Because of the way TCP works, it has a bounded speed for small chatty transactions that is determined primarily by the latency, not the throughput.

If you look at a network throughput graph from a packet capture, it looks like a sawtooth pattern. This is called slow start, and its a key feature of TCP and all similar protocols.

So if a server A wants to talk to a server B, it sends 8 packets, waits for a response, then sends 16 packets, waits, 24 packets, waits, and so on until a response is dropped. It then resets to 8 packets. There are lots of variations on this algorithm, such as using a "cubic" curve instead of a linear curve, but the end result is pretty much the same.

Even on an infinite bandwidth link, sending a small blob of JSON -- say 200 kilobytes -- will take pretty much the same time as it would on a 1 Gbps link!

As a side effect of this, anything that reduces latency can have a dramatic effect on effective bandwidth. I've seen some applications triple in speed simply because I enabled "Accelerated Networking" in Azure and used a Proximity Placement Group.

10000truths3y ago

The slow start behavior you describe is not inherent to TCP proper, but rather, a detail of the congestion control algorithm in use by the endpoints' TCP stacks. Most such algorithms will have some kind of AIMD feedback loop to achieve some balance of fairness and efficiency. But for applications where you have control over the endpoints and the network in between them, you can minimize slow start by setting a high initcwnd/initrwnd and using a less aggressive window shrinking mechanism.

dilyevsky3y ago

No one is forcing anyone to use slowstart. You can also disable things like Nagles to improve latency on small packets.

> Even on an infinite bandwidth link, sending a small blob of JSON -- say 200 kilobytes -- will take pretty much the same time as it would on a 1 Gbps link!

Technically that’s incorrect- that will take rtt+200kb/rate assuming your window is over 200kb. So depending on how large rtt is bw component may or may not be significant

whoisthemachine3y ago

> You cannot escape latency (without inventing another universe in which physics do not apply). And latency is detrimental to performance.

This. So few people distinguish between bandwidth and latency. One can be increased arbitrarily and fairly easily with new encoding techniques (which generally only improves edge cases), and the other has a floor that is hard-coded into our universe. I've gotten into debates with folks who think a 10GB connection from the EU to Texas should be as fast as a connection from Texas to the Midwest, or to speed up the EU-TX connection they just need to spend more on bandwidth.

aledalgrande3y ago

> 10GB connection from the EU to Texas should be as fast as a connection from Texas to the Midwest

and that is even before you take into consideration network topology

dilyevsky3y ago

Directly attached NVME drives will have throughput of up to 30-50Gbps (with something like m.2) which should be attainable with NVMe-oF over QSFP28 and it's not that rare or expensive anymore. Others have commented on the latency. Fibre Channel can be considered network too (and it is) and it's quite fast.

osigurdson3y ago

Light travels 300km in 1 millisecond. Intra datacenter latency is not bounded by physics. It is bounded by current technology.

hotpotamus3y ago

And in 1 millisecond, a 1Ghz CPU will have 1 million cycles. It's a bit like sending a letter and then waiting a month or two for a response.

1 more reply

ahachete3y ago

Light travels much slower (~1.5x slower) on a fiber optic, due to the refractive index (~ 1.5) of the fiber.

1 more reply

briffle3y ago

it seems most of the tools for running postgresql in K8s seem to just default to creating a new copy of the DB at the drop of a hat. When your DB is in the multi-TB sizes, that can come with a noticable cost in network fees, plus a very long delay, even on modern fast networks.

geggam3y ago· 3 in thread

Are you talking about the cloud host to cloud host networking or the POD networking inside the single host ?

The dizzying amount of NAT layers has to be killing performance. I haven't had the chance to ever sit down and unravel a system running a good load. The lack of TCP tuning combined with the required connection tracking is interesting to think about

kazen443y ago

i still dont understand why nearly all CNI's are so hell bent on implementing a dozen layers of NAT to tunnel their overlay networks, instead of implementing a proper control plane to automate it all away between routes.

Calico seems to be doing it semi-okeish, and even their the control plane is kind of unfinished?

The only software based solution which seem to properly have this figured out is VMware NSX-T. (i am not counting all the traditional overlay networks in use by ISP's based on MPLS/BGP).

jiggawatts3y ago

I believe Azure CNI is pretty much point-to-point.

Azure Load Balancers and their software defined network use packet header rewriting at the host level to bypass the need for the traffic to physically traverse a load balancer appliance or a NAT appliance. They're generally rewritten when they arrive to the host hypervisor. This is done in hardware via an FPGA inline with the NICs. (This requires "Accelerated Networking" to be enabled, but that's the default in v4 VMs and required for v5 VMs.)

I'm not certain, but I believe AWS does something similar for their VMs. (Their marketing material mentions that they use a custom ASIC instead of an FPGA like Azure.)

With Azure Kubernetes Service (AKS), you can use the Azure CNI, which gives each Pod a unique IP address on the Azure Virtual Network. I can't confirm, but I'm reasonably certain that this means that Pod-to-Pod traffic is direct, with no NAT appliance or software in the way. Essentially the host NICs do the address translation inline at line rate and essentially zero latency.

However, PaaS platforms like Azure App Service or Azure SQL Database are very bad in comparison. They proxy and tunnel and NAT, all in software. I've seen latencies north of 7 milliseconds within a region!

geggam3y ago

Before you even get to the CNI, I think AWS VM to internet is at least 3 NAT layers.

So we have 3 layers from container to pod. The virtual host kernel is tracking those layers. Once connection to one container is 3 tracked connections. Then you have whatever else you put on top to go in and out of the internet.

The funny think to me is HaProxy recommended getting rid of connection tracking for performance while everyone is doubling down on that alone and calling it performant.

j / k navigate · click thread line to collapse

0 comments

15 comments · 2 top-level

kazen443y ago· 10 in thread

> Few developers think about the cost of using the network.

Developers do not seem to realise how slow the network is compared to everything else.

You cannot escape latency (without inventing another universe in which physics do not apply). And latency is detrimental to performance.

Getting anything across a large enough network under 1millisecond is hard, and compared to a IOP on a local NVME disk, it is painfully slow.

jiggawatts3y ago

If you look at a network throughput graph from a packet capture, it looks like a sawtooth pattern. This is called slow start, and its a key feature of TCP and all similar protocols.

Even on an infinite bandwidth link, sending a small blob of JSON -- say 200 kilobytes -- will take pretty much the same time as it would on a 1 Gbps link!

10000truths3y ago

dilyevsky3y ago

No one is forcing anyone to use slowstart. You can also disable things like Nagles to improve latency on small packets.

> Even on an infinite bandwidth link, sending a small blob of JSON -- say 200 kilobytes -- will take pretty much the same time as it would on a 1 Gbps link!

Technically that’s incorrect- that will take rtt+200kb/rate assuming your window is over 200kb. So depending on how large rtt is bw component may or may not be significant

whoisthemachine3y ago

> You cannot escape latency (without inventing another universe in which physics do not apply). And latency is detrimental to performance.

aledalgrande3y ago

> 10GB connection from the EU to Texas should be as fast as a connection from Texas to the Midwest

and that is even before you take into consideration network topology

dilyevsky3y ago

osigurdson3y ago

Light travels 300km in 1 millisecond. Intra datacenter latency is not bounded by physics. It is bounded by current technology.

hotpotamus3y ago

And in 1 millisecond, a 1Ghz CPU will have 1 million cycles. It's a bit like sending a letter and then waiting a month or two for a response.

1 more reply

ahachete3y ago

Light travels much slower (~1.5x slower) on a fiber optic, due to the refractive index (~ 1.5) of the fiber.

1 more reply

briffle3y ago

geggam3y ago· 3 in thread

Are you talking about the cloud host to cloud host networking or the POD networking inside the single host ?

kazen443y ago

Calico seems to be doing it semi-okeish, and even their the control plane is kind of unfinished?

The only software based solution which seem to properly have this figured out is VMware NSX-T. (i am not counting all the traditional overlay networks in use by ISP's based on MPLS/BGP).

jiggawatts3y ago

I believe Azure CNI is pretty much point-to-point.

I'm not certain, but I believe AWS does something similar for their VMs. (Their marketing material mentions that they use a custom ASIC instead of an FPGA like Azure.)

geggam3y ago

Before you even get to the CNI, I think AWS VM to internet is at least 3 NAT layers.

The funny think to me is HaProxy recommended getting rid of connection tracking for performance while everyone is doubling down on that alone and calling it performant.

j / k navigate · click thread line to collapse