> Which context switches? With the Go model, I have exactly one thread per CPU, no context switches.
Not in the user application model you were describing. Those threads would need to coordinate and communicate (for example, back to the user interface), and that implies context switches.
Unless you're thinking of a strictly isolated thread-per-core design (https://mechanical-sympathy.blogspot.com/2011/09/single-writ...), which means we are then in agreement.
> And if you are memory-bound, why have more CPUs?
Yes, exactly, and that's why parallelism can often make things worse: https://brooker.co.za/blog/2014/12/06/random.html
However, for independent processes, each additional CPU adds memory bandwidth (according to the NUMA model) because there's a concurrency limit to each CPU's LFB that puts an upper bound of 6 GB/s on filling cache lines for cache misses (even if the bandwidth of your memory system is actually much higher): https://www.eidos.ic.i.u-tokyo.ac.jp/~tau/lecture/parallel_d...