undefined | Better HN

0 pointsGordonS3y ago0 comments

Is it really? Going from CPU to GPU, I would have expected a much better improvement.

0 comments

6 comments · 2 top-level

rahimnathwani3y ago· 2 in thread

You can think of it this way: if half the model is running on the GPU, and the GPU is infinitely fast, then the total calculation time would go down by 50%, compared with everything running on the CPU.

ethbr03y ago

Ref Amdahl's Law: https://en.m.wikipedia.org/wiki/Amdahl%27s_law

quickthrower23y ago

Wow seems like common sense turned into a law. Maybe I can get a law :-).

Anyone who has commuted on public transport probably knows this intuitively. (Using a kick scooter instead of walking cut my travel time by a good 5% which was excellent, as I still needed to be on a bus where that made no difference.)

2 more replies

qwertox3y ago· 2 in thread

I feel the same.

For example some stats from Whisper [0] (audio transcoding, 30 seconds) show the following for the medium model (see other models in the link):

---

GPU medium fp32 Linear 1.7s

CPU medium fp32 nn.Linear 60.7s

CPU medium qint8 (quant) nn.Linear 23.1s

---

So the same model runs 35.7 times faster on GPU, and compared to an "optimized" model still 13.6.

I was expecting around an order or magnitude of improvement.

Then again, I do not know if in the case of this article the entire model was in the GPU, or just a fraction of it (22 layers) and the remainder on CPU, which might explain the result. Apparently that's the case, but I don't know much about this stuff.

[0] https://github.com/MiscellaneousStuff/openai-whisper-cpu

pavelstoev3y ago

Training and inference on GPUs significantly underutilize …the GPUs. So tuning and various tricks need to be applied to achieve dramatic performance gains. If I am not good at cooking, giving me a larger kitchen will not make me faster or better.

rahimnathwani3y ago

You last paragraph is correct. Only about half the model was running on the GPU.

j / k navigate · click thread line to collapse