undefined | Better HN

0 pointsecnahc5155y ago0 comments

A lot of this is just spending more money and resources to make it possible to optimize for speed.

With sufficient caching with and a lot of parallelism makes this possible. That costs money though. Caching means storing data twice. Parallelism means more servers (since you'll probably be aiming to saturate the network bandwidth for each host).

Pre-aggregating data is another part of the strategy, as that avoids using CPU cycles in the fast-path, but it means storing even more copies of the data!

My personal anecdotal experience with this is with SQL on object storage. Query engines that use object storage can still perform well with the above techniques, even though querying large amounts of data from object is slow. You can bypass the slowness of object storage if you pre-cache the data somewhere else that's closer/faster for recent data. You can have materialized views/tables for rollups of data over longer periods of time, which reduces the data needed to be fetched and cached. It also requires less CPU due to working with a smaller amount of pre-calculated data.

Apply this to every layer, every system, etc, and you can get good performance even with tons of data. It's why doing machine-learning in real- is way harder than pre-computing models. Streaming platforms make this all much easier as you can constantly be pre-computing as much as you can, and pre-filling caches, etc.

Of course, having engineers work on 1% performance improvements in the OS kernel, or memory allocators, etc will add up and help a lot too.

0 comments

1 comments · 1 top-level

QuercusMax5y ago

One interesting thing to note is that there are lots of internal tools (CLI, web UI, etc.) that are REALLY slow. Things that are heavily used in the fast-path for development (e.g. code search, code review, test results) are generally pretty quick, but if there's a random system that has a UI, it's probably going to be very slow - because there's no budget for speeding them up, and the only people it annoys are engineers from other teams.

j / k navigate · click thread line to collapse

0 comments

1 comments · 1 top-level

QuercusMax5y ago

j / k navigate · click thread line to collapse