Running DeepSeek V3 671B on M4 Mac Mini Cluster (opens in new tab)

(blog.exolabs.net)

8 pointschadash1y ago3 comments

3 comments

3 comments · 2 top-level

talldayo1y ago· 1 in thread

> The M4 Max has 546GB/s of memory bandwidth and ~34TFLOPS (fp16) = ~68 GB/s, a ratio of ~8.02. Whereas NVIDIA RTX 4090 has 1008GB/s memory bandwidth and ~330TFLOPS (fp16) = ~660GB/s, a ratio of ~1.52.

Why are we comparing FP16 performance when you're inferencing INT4 quantized models? Seems like a misleading figure to compare with when it's not really even the performance you're measuring.

boroboro41y ago

Because INT4 quantized weights still use FP16 compute in most cases. Sometimes it's possible to use FP8/INT8 compute, and there is research to use INT4 compute, but it's rather rare.

jauntywundrkind1y ago

Shout out to the video team, for this super cute tower of minis next to the Christmas tree.

https://x.com/exolabs/status/1872444906851229814

Only just considering now that Strix Halo could help fill this gap that Mac chips with their huge memory bandwidth enjoy. 256GB systems shouldn't be hard to build!!

MI300a APU seems not popular but for consumers, this mix of big CPU and GPU seems perhaps quite compelling!

j / k navigate · click thread line to collapse