Will be interesting to see the value/performance compared to next gen M4 Ultra's (or Extreme?) vs NVIDIA's new DIGITS [2] when they're released.
As for Apple, we'll see.
https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
6 to 8 tokens per second.
And less than a tenth of the cost of a GPU setup.
That's almost nothing. If these models are capable/functional enough for most day-to-day uses, then useful LLM-based GenAI is already at the "too cheap to meter" stage.
I don't think they specified what they were using for networking, but it was probably Thunderbolt/USB4 networking which can reach 40Gbps.
from deepseek v3:
"For this reason, after careful investigations, we maintain the original precision (e.g., BF16 or FP32) for the following components: the embedding module, the output head, MoE gating modules, normalization operators, and attention operators...To further guarantee numerical stability, we store the master weights, weight gradients, and optimizer states in higher precision. "
I’m hoping NVIDIA comes up with their new consumer computer soon!
Still interesting though.
How many additional nuclear power plants will need to be built because even these incredibly technical achievements are, under the hood, morons? XD