undefined | Better HN

0 pointsyunwal3y ago0 comments

> When the database is far from saturated on CPU

The issue here is if you scale enough saturate the database, you'll have to rewrite essentially all your code if you're a typical CRUD webapp. Basically all of your business logic is about data retrieval. There's probably some companies that can get away with this, but it would be way too expensive for most.

0 comments

3 comments · 1 top-level

dventimi3y ago· 2 in thread

If I scale up to saturate the database CPU...by doing data retrieval? Setting aside my skepticism about saturating the CPU with mere data retrieval, how is that solved by moving the data to another host's CPU, when moving the data involves the very data retrieval that's saturating the database's CPU?

yunwalOP3y ago

If you mix in compute-heavy calculations (like stochastic gradient descent) in with your pure data-retrieval, yes you'll saturate the database's CPU, and you won't have a reasonable way to scale it.

If you do it on a host that's not a database, then you can horizontally scale it. There's a reason stateless apps are the default.

dventimi3y ago

Ok so if I'm not mixing in compute heavy workloads like the stochastic gradient descent described in the article, then it's less likely I'll saturate the CPU. Perhaps that won't happen at all and then I won't have to scale horizontally.

On the other hand if I'm doing stochastic gradient descent that's saturating the CPU then there's a good chance I'm doing offline training of an ML model. In that case my latency tolerances are probably much much higher. In other words, I can also avoid scaling horizontally provided I can live with longer training times. That might be a worthwhile trade-off to me given the added complexity of horizontal scaling.

Good to know!

j / k navigate · click thread line to collapse