undefined | Better HN

0 pointsrfoo1y ago0 comments

Huh? What kind of RDMA has a completion latency of 20 nanoseconds? It's more like 5 microseconds.

I agree that a lot of "modern" storage stack is way too slow though, tried to find a replication-first object storage for crazy-fast random read in small number of objects last year and found none.

0 comments

3 comments · 2 top-level

startupsfail1y ago· 1 in thread

I was talking about, thinking in terms of 20 nanoseconds intervals, rather than completing a request in 20 nanoseconds. To get 1 microsecond wire-to-wire latency you do need to count your nanoseconds.

Why this number - this is because it’s roughly the time it takes to read 64 bytes from L3 cache. And NICs tend to be able to push data into L3 (or equivalents).

Current state of the art - look up nanoPU, from Stanford. Wire-to-wire under 100ns is not impossible, but this would normally assume pre-cooked packet, selected from a number of packets (which is not an unusual scenario in HFT).

rfooOP1y ago

Ah, makes sense. Sadly RDMA isn't that fast for now, or at least commercial RNICs/switches don't :( Once you left your host in data center network, everything counts in microseconds.

tucnak1y ago

Completion latency is one thing, bandwidth would be another. There's apparently a whole world of Alveo SmartNIC's and related FPGA platforms, and it can totally get in nanosecond range for whatever nails that may fit the compute-in-network hammer, even if bound by latency of the consuming system / RDMA interface. Also: https://github.com/corundum/corundum is really popular with the Chinese!

j / k navigate · click thread line to collapse