There might be a reason to subsidize subscriptions, but only if your value is in the app rather than the model.
But for API use, the models are easily substituted, so market share is fleeting. The LLM interface being unstructured plain text makes it simpler to upgrade to a smarter model than than it used to be to swap a library or upgrade to a new version of the JVM.
And there is no customer loyalty. Both the users and the middlemen will chase after the best price and performance. The only choice is at the Pareto frontier.
Likewise there is no other long-term gain from getting a short-term API user. You can't train out tune on their inputs, so there is no classic Search network effect either.
And it's not even just about the cost. Any compute they allocate to inference is compute they aren't allocating to training. There is a real opportunity cost there.
I guess your theory of Opus 4.1 having massive margins while Opus 4.5 has slim ones could work. But given how horrible Anthropic's capacity issues have been for much of the year, that seems unlikely as well. Unless the new Opus is actually cheaper to run, where are they getting the compute from for the massive usage spike that seems inevitable.
It's much more akin to a programming language or platform than a typical data-access API, because the choice of LLM vendor then means that you build a lot of your future product development off the idiosyncracies of their platform. When you switch you have to redo much of that work.
This isn't even theory, we can observe the swings in practice on Openrouter.
If the value was in prompt engineering, people would stick to specific old versions of models, because a new version of a given model might as well be a totally different model. It will behave differently, and will need to be qualified again. But of course only few people stick with the obsolete models. How many applications do you think still use a model released a year ago?
It is possible to write adapters to API interfaces. Many proprietary APIs become de-facto standards when competitors start creating those compatibility layers out of the box to convince you it is a drop-in replacement. S3 APIs are good example Every major (and most minor) providers with the glaring exception of Azure support the S3 APIs out of the box now. psql wire protocol is another similar example, so many databases support it these days.
In the LLM inference world OpenAI API specs are becoming that kind of defacto standard.
There are always caveats of course, and switches go rarely without bumps. It depends on what you are using, only few popular widely/fully supported features or something niche feature in the API that is likely not properly implemented by some provider etc, you will get some bugs.
In most cases bugs in the API interface world is relatively easy to solve as they can be replicated and logged as exceptions.
In the LLM world there are few "right" answers on inference outputs, so it lot harder to catch and replicate bugs which can be fixed without breaking something else. You end up retuning all your workflows for the new model.
Agree that the plain text interface (which enables extremely fast user adoption) also makes the product less sticky. I wonder if this is part of the incentive to push for specialized tool calling interfaces / MCP stuff - to engineer more lock in by increasing the model specific surface area.
There are other valid reasons for why it might be faster, but faster even while everyone's rushing to try it at launch + a cost decrease leaves me inclined to believe it's a smaller model than past Opus models