You don't need to run large models, Gemma QAT 27B fits on one GPU and is quite good. Other models like Qwen3 are great for coding.
3090 gets 100+ tokens/second for QWEN, very close to what you would see with a cloud based model.
M3 ultra gets ~30.
Congrats, you played yourself.