This week's released of the new smaller Qwen 3.5 models was interesting. I ran a 4-bit quant of the 122b model on my NVIDIA Spark, and it's... pretty damn smart. The smaller models can be run at 8-bits on machines at very reasonable speeds. And they're not stupid. They're smarter than "ChatGPT" was a year or so ago.
AMD Strix Halo machines with 128GB of RAM can already be bought off the shelf for not-insane prices that can run these just fine. Same with M-series Macs.
Once the supply shocks make their way through the system I could see a scenario where it's possible that every consumer Mac or Windows install just comes with a 30B param or even higher model onboard that is smart enough for basic conversation and assistance, and is equipped with good tool use skills.
I just don't see a moat for OpenAI or Anthropic beyond specialized applications (like software development, CAD, etc). For long-tail consumer things? I don't see it.