There’s also latency, dependency on external infrastructure, privacy and compliance concerns, energy usage, and just the general predictability of the system itself.
My guess is that this will gradually push a lot of companies toward more hybrid architectures over time. Small or local models are probably good enough for things like filtering, routing or repetitive high volume tasks, while frontier models get reserved for the places where the quality jump actually justifies the added cost and complexity.
As useful as frontier models are, using them for absolutely everything sometimes reminds me of using a distributed system for problems that could have been solved locally with something much simpler.
I wouldn’t be surprised if, in many real world cases, a fast specialized system plus a smaller model ends up being the more practical and economical setup overall.
No comments yet.