I was talking about, thinking in terms of 20 nanoseconds intervals, rather than completing a request in 20 nanoseconds. To get 1 microsecond wire-to-wire latency you do need to count your nanoseconds.
Why this number - this is because it’s roughly the time it takes to read 64 bytes from L3 cache. And NICs tend to be able to push data into L3 (or equivalents).
Current state of the art - look up nanoPU, from Stanford. Wire-to-wire under 100ns is not impossible, but this would normally assume pre-cooked packet, selected from a number of packets (which is not an unusual scenario in HFT).