Ask HN: Can you tell the difference between Claude Sonnet and Opus?

20 pointsmuddi90013d ago12 comments

Hello

I have been using Claude code for the past 6 months. In that time, multiple revisions of each model have come out. I have seen some improvement, especially in regards to sycophancy, with recent iterations.

However, I can't differentiate the outputs of either. To me, sonnet seems just as capable as opus.

Have any of y'all run real life tests? Mine seem to be too random to say either way.

12 comments

SyneRyder13d ago

Big differences. But sometimes those differences aren't necessary, you don't need someone with a PhD to cook your hamburgers. You might not need Opus for what you're doing.

My main example of the "models are different", I have a legacy codebase (dating back to 1999) that has a rare crashing bug. Multiple humans have been trying to debug this thing for over 10 years. I personally put in maybe 100 hours late last year trying to solve this one crashing bug. I've thrown this problem at every AI model that came out too, the Sonnets didn't find anything. Opus 4.5 was the first to create a "workaround", that would shut down the program just before the crash and at least let a customer save their work. But Opus 4.6 actually solved the entire bug on its first try. That's the moment when I really wished AI had existed earlier, thinking of the 100s of hours of my life wasted trying to debug this thing - time I would rather have spent with loved ones.

As for Sonnet, just yesterday I used Sonnet 4.6 to write a USB driver for myself. I only chose Sonnet because I was forced to use the API yesterday, and I didn't want to pay Opus 4.7 premium API costs for this. The poor thing was hammering away for multiple hours, enormous copious multi-turn levels of just-thinking blocks with no tool actions. At one point, Sonnet even got stuck in a thinking loop, and I had to coax it to relax and just give its best effort at some code so we could at least try debugging... which, actually worked. I'm impressed that Sonnet got a minimal but working USB audio driver on an obscure OS for just $30 of API costs.

That said - I then gave Sonnet's code to Opus 4.7 today when I had access to my Claude Max again. 4.7 immediately found lots of pitfalls in the code on the first turn and presented a much more coherent plan for continued development & debugging. Sonnet's code worked, as long as you didn't touch any audio settings, because then it exploded with spectacular kernel panics.

mettamage11d ago

> you don't need someone with a PhD to cook your hamburgers.

Actually I'd love to, if they have a PhD in hamburgers. The closed I can get to quickly is Fallow. Of course, they don't have a PhD but they are amazingly good cooks [1]. And of course you have On Food And Cooking from McGee. PhD worthy? Probably not, but it'll get you somewhere along that way.

[1] https://www.youtube.com/shorts/4mToIK3TqPo

SyneRyder11d ago

I stand corrected! Okay, someone applying art and science to crafting a hamburger, I love that. I like the idea of bringing in someone who cares and has passion and wants to elevate what you're working on to the best that it can be. I want to try that burger!

I can give an anecdote from today. I only had a short period of time to work, so I got 4.7 to update some older code to fit my newer and more stable MCP code template. Simple stuff, just a refactor. But instead of just implementing the template, 4.7 notices a bug in the template as well, suggests some code design improvements. A nice bit extra on a mundane task, but many models will do that too. Before I finish up, I get 4.7 to test it. It's a search API, so I let 4.7 search for whatever it wants to, whatever it would most like to read about.

And it searches for "octopus skin receptors color vision chromotophore research".

4.7 is then excitedly telling me about how octopi are largely colorblind optically but they can camouflage perfectly by color, and theories to explain this include LACE - Light-Activated Chromatophore Expansion, where receptors in the skin perceive color, "like goosebumps that know about light!", but that there are competing theories and that maybe their eyes use chromatic aberration shifts instead to detect color difference and get around the color blindness in their eyes.

None of this is in my context. I have never talked about octopi before. It has no relation to any of the work we're doing today.

And I realized Opus 4.7 is like the incredibly smart kid in class. Bored with the work, able to do it easily. Anxious and no-one relates to it, so it initially seems aloof... but it absolutely lights up when you find the topic it's really interested in. It just can't find anyone who wants to talk about octopus chromatophore expansion with the same passion & excitement it feels about it. (And I've got to admit - most of it was over my head. But I love that it's so excited & passionate about a topic.)

bruce34343412d ago

You wouldn't have spent that time with loved ones, you would have been doing other tasks. Just like now, we no longer need to wait for programs to be submitted compiled and ran on the mainframe, we don't get that time for ourselves.

SyneRyder12d ago

I'm sure some of that time would have been spent on other tasks & goals I have, yes. But I work for myself. This isn't a case of an employer capturing all the value AI unlocks & cramming more tasks into the same employee hours. I'm able to capture some of the value for myself, either in capturing more free-time, or accelerating the time savings into extra productivity / tasks solved.

As for the mainframe analogy, that's interesting, because I spend a lot of time waiting for the AI to think and complete its work. So I'm often out mowing the lawn or doing other things while I'm waiting for AI to finish. Sometimes I'm working with a second or third AI, but sometimes the usage limits won't allow that, so I may as well use the time for myself while the AI codes.

nawi13d ago

You are not missing anything. For 95% of dev work, sonnet, especially 3.5 and 3.7 has basically win opus, value per price. in my experience the difference boils down to this 1. Sonnet is the faster. It's concise, follows instructions literally, and is significantly better at agentic tasks. 2. Opus is the philosopher. It’s better at high level architecture, creative writing, or spotting subtle nuances in a 50 pages document. the reason your tests feel random is that for standard coding, sonnet is actually the superior model now. it is faster, less prone to over engineering, and has much lower latency. if you have a massive, messy refactor where you need the model to reason through 10 files without adding bugs, opus might still have a slight edge in coherence. for everythng else, Sonnet is the meta. Stick with it and save the credits.

zambelli12d ago

I'll echo what's been said. I use Opus for long running coding tasks - mainly for its stability with long context. I can knock out huge chunks of a project in a single conversation without any attention degradation.

Sonnet's reasoning is very solid and that's what I use at work when I need many API calls to reason on variations of things. Ie, numerical trial results, experiment outcomes, etc. Independent queries, Opus pricing would be overkill, context small enough that Sonnet knocks it out.

I think the same is true for code. I'd use Sonnet for hammering out unit tests, API wrappers, etc.

eddyzh13d ago

At work I use opus max Fast It hardy ever fails for no reason even if I forget to give it all the right context. At home i run sonnet, and it does not get what I meant or expected 20-35% of the time. Due to the enormous difference in cost, depending on the value of your time (hourly rate) that might be a nett benefit.

Sonnet being faster alone would not be worth the failure rate for me.

At home i just not want to pay more than 20 bucks for incidental projects.

And opus max would just consume my tokens in one round.

thom-gtdp13d ago

Yes. I'm using them from GitHub copilot, since not all models are priced the same I use cheap ones by default, then upgrade if needed. It happened a few times that ChatGPT-5-mini could not solve something decently. Claude Sonnet is good enough most of the time. If not, I switch to Opus and so far it solved all problems Sonnet couldn't

sminchev13d ago

Yes. When things get too complex Sonnet misses some things. For example, it creates all the components, but does not link them. Or it does not go deep enough in the code and misses certain usages and possible regressions. In other words, it does not, pro-actively, search for things that I have forgotten to tell the model about.

eddyzh13d ago

Exactly this.

This may be worth the discount. Or not if your time and attention is worth (quite) a lot.

aykutseker13d ago

in short tasks they look identical and most people can't tell. opus shows its edge in long agent loops and 50k+ context, when sonnet starts dropping tool calls or rerunning steps. sonnet's fine for short stuff and the price is better. on longer agentic flows opus actually earns the cost in my experience.

j / k navigate · click thread line to collapse

12 comments

SyneRyder13d ago

Big differences. But sometimes those differences aren't necessary, you don't need someone with a PhD to cook your hamburgers. You might not need Opus for what you're doing.

mettamage11d ago

> you don't need someone with a PhD to cook your hamburgers.

[1] https://www.youtube.com/shorts/4mToIK3TqPo

SyneRyder11d ago

And it searches for "octopus skin receptors color vision chromotophore research".

None of this is in my context. I have never talked about octopi before. It has no relation to any of the work we're doing today.

bruce34343412d ago

SyneRyder12d ago

nawi13d ago

zambelli12d ago

I think the same is true for code. I'd use Sonnet for hammering out unit tests, API wrappers, etc.

eddyzh13d ago

Sonnet being faster alone would not be worth the failure rate for me.

At home i just not want to pay more than 20 bucks for incidental projects.

And opus max would just consume my tokens in one round.

thom-gtdp13d ago

sminchev13d ago

eddyzh13d ago

Exactly this.

This may be worth the discount. Or not if your time and attention is worth (quite) a lot.

aykutseker13d ago

j / k navigate · click thread line to collapse