I don't want to make an ad here but I'm going to point to HuggingFace https://endpoints.huggingface.co (and to avoid singling them out just https://replicate.com/pricing too but I don't know them well) as an example with pricing.
The "beauty" IMHO of such solutions is that again you pay for what you want. If you want to use the endpoint only for 5min to test that the model and its API fits your need? OK. You want the whole month? Sure. You want 1 user, namely you? Fine, not a lot of power, you want your whole organization to use that endpoint? Scale up.
I'm going to give very rough approximation because honestly I'm not really into this so someone please adjust with source :
Apple Mac Studio M3 Ultra 96GB = $4K
~NVIDIA A100 with 80G ~ 10x perf compared to M3 Pro (obviously depends on models)
So on Replicate today a one can get an A100 for ~$5/hr which is ... about a month. But that's for 10x speed and electricity included. So very VERY approximately if you use a Mac Studio for 10 months on AI non stop (days and night) then it's arguably worth it.
If you use it less, say 2hrs/day only for inference, then I imagine it takes few years to have the equivalent and by that time I bet Replicate or HuggingFace is going to rent much faster setup for much cheaper simply because that's what they have ALL done for the last few years.