They won't ever be SOTA due to money, but "last year's SOTA" when it costs 1/4 or less, may be good enough. More quantity, more flexibility, at lower edge quality. It can make sense. A 7% dumber agent TEAM Vs. a single objectively superior super-agent.
That's the most exciting thing going on in that space. New workflows opening up not due to intelligence improvements but cost improvements for "good enough" intelligence.
Edit: the replies to my comment are great examples of what I’m talking about when I say it’s hard to determine what hardware I’d need :).
[†] The latest Qwen 3.6 whatever has been a noticeable improvement, and I'm not even at the point where I tweak settings like sampling, temperature, etc. No idea what that stuff does, I just use the staff picks in LM Studio and customize the system prompts.
Yes, it's possible to run tiny quantized models, but you're working with extremely small context windows and tons of hallucinations. It's fun to play with them, but they're not at all practical.