undefined | Better HN

0 points_vvhw5y ago0 comments

You are not factoring in the cost of context switches, and that many user applications today are memory-bound and not CPU-bound.

It's one of the secrets exploited by the M1 chip, seen in how many more cache lines the CPU's LFB can fill concurrently compared to Intel chips and that these are now 128 byte cache lines instead of 64 byte cache lines.

0 comments

_ph_5y ago

Which context switches? With the Go model, I have exactly one thread per CPU, no context switches. And if you are memory-bound, why have more CPUs?

But sure, there is a reason why the M1 has so stellar performance, it has one of the fastest single-thread performances and many applications do not manage to load more than 4 cores for common tasks - which partially is also a consequence of doing that is difficult in many programming languages, but easy in some, which are only slowly gaining traction.

_vvhwOP5y ago

> Which context switches? With the Go model, I have exactly one thread per CPU, no context switches.

Not in the user application model you were describing. Those threads would need to coordinate and communicate (for example, back to the user interface), and that implies context switches.

Unless you're thinking of a strictly isolated thread-per-core design (https://mechanical-sympathy.blogspot.com/2011/09/single-writ...), which means we are then in agreement.

> And if you are memory-bound, why have more CPUs?

Yes, exactly, and that's why parallelism can often make things worse: https://brooker.co.za/blog/2014/12/06/random.html

However, for independent processes, each additional CPU adds memory bandwidth (according to the NUMA model) because there's a concurrency limit to each CPU's LFB that puts an upper bound of 6 GB/s on filling cache lines for cache misses (even if the bandwidth of your memory system is actually much higher): https://www.eidos.ic.i.u-tokyo.ac.jp/~tau/lecture/parallel_d...

j / k navigate · click thread line to collapse