I have been using Claude code for the past 6 months. In that time, multiple revisions of each model have come out. I have seen some improvement, especially in regards to sycophancy, with recent iterations.
However, I can't differentiate the outputs of either. To me, sonnet seems just as capable as opus.
Have any of y'all run real life tests? Mine seem to be too random to say either way.
My main example of the "models are different", I have a legacy codebase (dating back to 1999) that has a rare crashing bug. Multiple humans have been trying to debug this thing for over 10 years. I personally put in maybe 100 hours late last year trying to solve this one crashing bug. I've thrown this problem at every AI model that came out too, the Sonnets didn't find anything. Opus 4.5 was the first to create a "workaround", that would shut down the program just before the crash and at least let a customer save their work. But Opus 4.6 actually solved the entire bug on its first try. That's the moment when I really wished AI had existed earlier, thinking of the 100s of hours of my life wasted trying to debug this thing - time I would rather have spent with loved ones.
As for Sonnet, just yesterday I used Sonnet 4.6 to write a USB driver for myself. I only chose Sonnet because I was forced to use the API yesterday, and I didn't want to pay Opus 4.7 premium API costs for this. The poor thing was hammering away for multiple hours, enormous copious multi-turn levels of just-thinking blocks with no tool actions. At one point, Sonnet even got stuck in a thinking loop, and I had to coax it to relax and just give its best effort at some code so we could at least try debugging... which, actually worked. I'm impressed that Sonnet got a minimal but working USB audio driver on an obscure OS for just $30 of API costs.
That said - I then gave Sonnet's code to Opus 4.7 today when I had access to my Claude Max again. 4.7 immediately found lots of pitfalls in the code on the first turn and presented a much more coherent plan for continued development & debugging. Sonnet's code worked, as long as you didn't touch any audio settings, because then it exploded with spectacular kernel panics.
Actually I'd love to, if they have a PhD in hamburgers. The closed I can get to quickly is Fallow. Of course, they don't have a PhD but they are amazingly good cooks [1]. And of course you have On Food And Cooking from McGee. PhD worthy? Probably not, but it'll get you somewhere along that way.
I can give an anecdote from today. I only had a short period of time to work, so I got 4.7 to update some older code to fit my newer and more stable MCP code template. Simple stuff, just a refactor. But instead of just implementing the template, 4.7 notices a bug in the template as well, suggests some code design improvements. A nice bit extra on a mundane task, but many models will do that too. Before I finish up, I get 4.7 to test it. It's a search API, so I let 4.7 search for whatever it wants to, whatever it would most like to read about.
And it searches for "octopus skin receptors color vision chromotophore research".
4.7 is then excitedly telling me about how octopi are largely colorblind optically but they can camouflage perfectly by color, and theories to explain this include LACE - Light-Activated Chromatophore Expansion, where receptors in the skin perceive color, "like goosebumps that know about light!", but that there are competing theories and that maybe their eyes use chromatic aberration shifts instead to detect color difference and get around the color blindness in their eyes.
None of this is in my context. I have never talked about octopi before. It has no relation to any of the work we're doing today.
And I realized Opus 4.7 is like the incredibly smart kid in class. Bored with the work, able to do it easily. Anxious and no-one relates to it, so it initially seems aloof... but it absolutely lights up when you find the topic it's really interested in. It just can't find anyone who wants to talk about octopus chromatophore expansion with the same passion & excitement it feels about it. (And I've got to admit - most of it was over my head. But I love that it's so excited & passionate about a topic.)
As for the mainframe analogy, that's interesting, because I spend a lot of time waiting for the AI to think and complete its work. So I'm often out mowing the lawn or doing other things while I'm waiting for AI to finish. Sometimes I'm working with a second or third AI, but sometimes the usage limits won't allow that, so I may as well use the time for myself while the AI codes.
Sonnet's reasoning is very solid and that's what I use at work when I need many API calls to reason on variations of things. Ie, numerical trial results, experiment outcomes, etc. Independent queries, Opus pricing would be overkill, context small enough that Sonnet knocks it out.
I think the same is true for code. I'd use Sonnet for hammering out unit tests, API wrappers, etc.
Sonnet being faster alone would not be worth the failure rate for me.
At home i just not want to pay more than 20 bucks for incidental projects.
And opus max would just consume my tokens in one round.
This may be worth the discount. Or not if your time and attention is worth (quite) a lot.