I can run full deepseek r1 on m1 max with 64GB of ram. Around 0.5 t/s with small quant. Q4 quant of Maverick (253 GB) runs at 2.3 t/s on it (no GPU offload).
Practically, last gen or even ES/QS EPYC or Xeon (with AMX), enough RAM to fill all 8 or 12 channels plus fast storage (4 Gen5 NVMEs are almost 60 GB/s) on paper at least look like cheapest way to run these huge MoE models at hobbyist speeds.