The problem is that the hardware is still like $3000. Making anything run on Macs is an exercise in futility. And its a shame that people get duped into buying Macs for LLM inference.
$3000 for running a 397B total parameters model is quite a bargain. The Mac is being used for its access to fast internal storage here since that's the key bottleneck, you could probably achieve similar outcomes with conventional (even fairly low-end) iGPU/APU hardware plus a fast PCIe x4 5.0 SSD (which would also allow you to overlap SSD transfers with iGPU/APU compute), but the cost would also be in a similar range. (Unless you carefully chose low-end e.g. Intel hardware with proper PCIe x4 5.0 NVMe support - which is still quite uncommon, especially for laptops.)