undefined | Better HN

0 pointsjfim6y ago0 comments

Do you have examples where running n threads instead of 1 results in a speedup greater than n?

The only thing I can think of would be that the additional threads would kick the CPU into using a higher frequency, but a single thread using 100% of the CPU should already do that.

0 comments

3 comments · 1 top-level

CUViper6y ago· 2 in thread

There are some broad examples of that effect here: https://en.wikipedia.org/wiki/Speedup#Super-linear_speedup

saltcured6y ago

The caching effects mentioned there have to do with adding additional hardware resources that change the effective cache sizes and shift the working set into higher performing storage or memory.

You can also encounter a funny software engineering effect. Refactoring an array-based algorithm to work in parallel often involves the introduction of block-decomposition. This change allows sub-problems to be spread to different workers. Sometimes, this change will also accelerate the sequential code, as the new block-oriented access patterns improve cache efficiency!

jfimOP6y ago

Oh I see, that makes complete sense.

Basically, given two small arrays A and B (such that both fit in the CPU caches), if the computations to do are sum(A) * sum(B) and sum(B) / sum(A), having two threads that do the computations will be more than twice as fast as a single thread. In the two thread case, A and B will be fetched in parallel (assuming left-to-right evaluation), incurring only one round of pipeline stalls for data fetching, whereas the single threaded case will have to incur two pipeline stalls from memory fetching.

j / k navigate · click thread line to collapse