That's a very fair point. I suppose I'm too used to the idea that the code which runs with sub-N-µs latency is going to process a "lot" of data, and that there's no way it will all be in the cache (any cache).
I suppose that if you're working on small amounts of data every time your code executes, then this becomes vastly more reasonable.