undefined | Better HN

0 pointslondons_explore2y ago0 comments

CPU inference is only a little slower. GPU's aren't good for a batch size of 1 and everything quantised.

0 comments

I get 3 tokens per second on M1 Max running 30B models compared to 1 token per second on a GPU (P40), both quantized to 4bit. So, in my opinion CPUs are better for inference (at least fast CPUs with DDR 5 versus cheapest GPUs).

The reason why GPUs seem to be the standard de facto is that they scale better, are more power efficient and are better supported by pytorch & co. Also, academia cares more about getting the best quality for their benchmarks, than about the performance and accessibility.

londons_exploreOP2y ago

GPU's win for training... And those who write papers and publish code tend to do lots of training and only a little inference.

j / k navigate · click thread line to collapse

0 comments

execveat2y ago

londons_exploreOP2y ago

GPU's win for training... And those who write papers and publish code tend to do lots of training and only a little inference.

j / k navigate · click thread line to collapse