undefined | Better HN

0 pointsjeffreyrogers11y ago0 comments

That's true, but the reason people care about it is because in the past more transistors led to higher clock speeds, which is what I was getting at by saying we can still put more transistors on a chip, but that power and heat limitations prevent that from turning into increased clock speeds.

0 comments

3 comments · 2 top-level

m0th8711y ago· 1 in thread

Yeah. I'd be curious to know how much of that is a product of the software/hardware architectures we currently use, e.g. programming languages generally not built from the start with parallelism in mind.

m_mueller11y ago

Look at it this way:

1. Most software today is I/O bound, which is when programmers don't (and shouldn't care) about shared memory parallelism.

2. Most popular programming languages today are based on classic imperative programming going back Fortran and co.

3. These classic languages suggest using loops.

4. Loops are inherently not parallelizable. Only in a specific case where there is no carried on dependency, a loop becomes parallelizable.

5. These languages have basically infested everything we do, including compute / memory bandwidth bound problems that should now be treated in parallel. (Even bandwidth goes down with sequential execution, for example on Intel sockets usually by a factor of 2).

6. Since therefore most parallel things get written in loops, this becomes a hard problem. What the compiler vendors are doing is usually flinging [directives](http://www.openacc.org/) [at us](http://openmp.org/wp/).

7. Directives work well until they don't and you have no idea why, because it's usually a black box for most programmers (who can't read assembly like code).

8. What we should get is a way of saying: Here is some scalar code. I'd like this code to be applied in parallel over domains X, Y, Z etc. Be aware of symbols alpha and beta that are dependant in X, Y, Z as well as gamma that is dependant in X only.

10. This should be available at language level, so programmers start thinking in these terms. Only then do we have a reliable way of making use of data parallelism.

11. CUDA and OpenCL are actually pretty close to this, but slightly too low level and generally thought as being hard to program in (which I don't agree, but that's the image).

Disclaimer: I've been involved in this problem space since some time and [this](https://github.com/muellermichel/Hybrid-Fortran) is what has come out of it. It's HPC targeted, but at some point I'd like to make this whole parallel computing thing more generally approachable.

phaemon11y ago

No, the reason people cared about it was because it meant a faster CPU. The reason people cared about clock speed was because for the same chip it meant a faster CPU.

But the misconception that simply "higher clock == fast CPU" is so old it's referred to as the "megahertz myth", since it predates gigahertz.

For example, on CPU benchmarking PassMark:

    Pentium 4, 3.0 GHz (fast!)   scores: 382
    Pentium 4, 3.8 GHz (faster!) scores: 492
    Core i7,   2.6 GHz (slow!)   scores: 9,992

CPUs are still getting faster.

j / k navigate · click thread line to collapse