This stuff is expensive because supply is much lower than demand. If everyone was to run their own hardware with a batch size of 1, we'd have 100x more demand for inference hardware and electricity than we do now, and people would be even more frustrated. Efficiency is everything, and we need all the economies of scale we can get to meet demand.