It helps to be able to run the model locally, and currently this is slow or expensive. The challenges of running a local model beyond say 32B are real.
I would be fine though with like 10 times the wait time. But I guess consumer hardware need some serius 'ram pipeline' upgrade for big models to be run at crawl speeds.