OpenAI / Google / Anthropic / XAI also have a ton of compute. That is the real moat.
https://bsky.app/profile/antirez.bsky.social/post/3mlzwmvlov...
It's much closer than you think. We're going to see specialized hardware in the next 24 months capable of running 2025-era frontier models. That's big.
Once and if the advancements with the AI models slow down, only then IMHO it will become feasible to design the specialized HW for general-purpose consumption and general-purpose workloads.
Even at 2-bit quantization, DS4 is probably on par with a 2024 frontier model. You can run that today on local hardware, and at a minimum, local models are going to keep pace over the next 12-24 months. Even if they don't close the gap with frontier models, they'll still play an important role in the overall pipeline for cost, speed and privacy reasons.
That's without even mentioning the additional capability that something like a Taalas chip churning out 17k tokens/sec could unlock.
Even if it were possible the LLMs are such a gold mine of user data. It's really hard to see that opportunity be passed up.
https://www.apple.com/shop/buy-mac/mac-studio
Same with the Mac mini. entirely removed from all store references
So long as there is demand, there are always going to be providers competing to offer it at a low cost. My understanding is that the median price on there is in the ballpark of what it costs to run the inference. This is very different from e.g. Opus, which you can basically only buy from Anthropic at the price they set.
It feels great to finally have access to something local.