The power efficiency alone is a strong enough pressure to use centralized model providers.
My 3090 running 24b or 32b models is fun, but I know I'm paying way more per token in electricity, on top of lower quality tokens.
It's fun to run them locally, but for anything actually useful it's cheaper to just pay API prices currently.