Ok so if I'm not mixing in compute heavy workloads like the stochastic gradient descent described in the article, then it's less likely I'll saturate the CPU. Perhaps that won't happen at all and then I won't have to scale horizontally.
On the other hand if I'm doing stochastic gradient descent that's saturating the CPU then there's a good chance I'm doing offline training of an ML model. In that case my latency tolerances are probably much much higher. In other words, I can also avoid scaling horizontally provided I can live with longer training times. That might be a worthwhile trade-off to me given the added complexity of horizontal scaling.
Good to know!