Why isn't Anthropic clearer about Sonnet being better then? Why isn't it included in the benchmark if new Sonnet beats Opus? Why are they so ambiguous with their language?
For example, https://www.anthropic.com/api says:
> Sonnet - Our best combination of performance and speed for efficient, high-throughput tasks.
> Opus - Our highest-performing model, which can handle complex analysis, longer tasks with many steps, and higher-order math and coding tasks.
And Opus is above/after Sonnet. That to me implies that Opus is indeed better than Sonnet.
But then you go to https://docs.anthropic.com/en/docs/about-claude/models and it says:
> Claude 3.5 Sonnet - Most intelligent model
- Claude 3 Opus - Powerful model for highly complex tasks
Does that mean Sonnet 3.5 is better than Opus for even highly complex tasks, since it's the "most intelligent model"? Or just for everything except "highly complex tasks"
I don't understand why this seems purposefully ambiguous?