A 34B model is probably about the largest you can run on a consumer GPU with 24GB VRAM. 70B will require A100's or a cloud host. 13B models are everywhere already. I'm sure this was a very deliberate choice - let people play with the 13B model locally to whet their appetite and then they can pay to run the 70B model on Azure.
I'm running a 30B model on an amd 5600x cpu at 2-3 tokens/s, which is just under a "read-aloud" pace. I'd wager that you can run a 70B model at about the same speed with a 7900x and a bit more RAM.
do you mind teaching how to do CPU/GPU RAM math? all i know is 34B 16bit = 68GB total RAM needed (because 1B of 8bytes = 1GB definitionally), but i dont know how it splits between CPU/GPU and whether the tradeoff in tok/s is acceptable