With sufficient caching with and a lot of parallelism makes this possible. That costs money though. Caching means storing data twice. Parallelism means more servers (since you'll probably be aiming to saturate the network bandwidth for each host).
Pre-aggregating data is another part of the strategy, as that avoids using CPU cycles in the fast-path, but it means storing even more copies of the data!
My personal anecdotal experience with this is with SQL on object storage. Query engines that use object storage can still perform well with the above techniques, even though querying large amounts of data from object is slow. You can bypass the slowness of object storage if you pre-cache the data somewhere else that's closer/faster for recent data. You can have materialized views/tables for rollups of data over longer periods of time, which reduces the data needed to be fetched and cached. It also requires less CPU due to working with a smaller amount of pre-calculated data.
Apply this to every layer, every system, etc, and you can get good performance even with tons of data. It's why doing machine-learning in real- is way harder than pre-computing models. Streaming platforms make this all much easier as you can constantly be pre-computing as much as you can, and pre-filling caches, etc.
Of course, having engineers work on 1% performance improvements in the OS kernel, or memory allocators, etc will add up and help a lot too.