undefined | Better HN

story

0 pointsmaksimum9y ago0 comments

I agree with your point about avoiding vendor lockins, something I experienced for myself with MATLAB. I also happened to buy a RX 480 recently, so I'm happy to hear it's good for GPGPU.

But I'm curious in how the FLOPS on these cards were measured. For example one concern I have is that presumably these two cards have slightly different levels of parallelism. So it may be more or less difficult to extract the full performance from a particular card due to parallelism overhead. Then there's driver overhead, ease of programming, etc.

0 comments

valarauca19y ago

FLOPS is always calculated via the simple formual

      F * (1/Hz) * 2 = FLOPS

Where F is # of FPU front ends (SIMD and scalar). This is wrong because scalar math often is slower then SIMD, and compute kernels rarely run on the scalar pipeline.

Where Hz is the well.. the clock rate, inverse to get cycles per second. This is wrong because stalls happen, memory transfers, cache misses etc. It is also wrong because the clock rate is throttled and you are not always at Maximum boost clock.

Then multiply by 2 for FMA (fused multiply add). This is wrong because well not every operation is a one cycle FMA. Division can be many (>100). Also scalar pipelines don't have FMA.

Ultimately all vendors use the same crappy calculation so we are comparing apples to apples. Just rotten apples to rotten apples. It gives you a good ideal circumstance you can optimize towards but never actually attain.

j / k navigate · click thread line to collapse