undefined | Better HN

0 pointssimonw3y ago0 comments

My understanding is that part of it is that Apple Silicon shares all available RAM between CPU and GPU.

I'm not sure how many of these models are actively taking advantage of that architecture yet though.

0 comments

1 comments · 1 top-level

The GPU isn't actually used by llama.cpp. What makes it that much faster is that the workload, either on CPU or on GPU, is very memory-intensive, so it benefits greatly from fast RAM. And Apple is using DDR5 running at very high clock speeds for this shared memory stuff.

It's still noticeably slower than GPU, though.

j / k navigate · click thread line to collapse