Elevated error rates on Opus 4.7 (opens in new tab)

(status.claude.com)

58 pointsrob1mo ago56 comments

56 comments

52 comments · 10 top-level

lepuski1mo ago· 38 in thread

I can't see why anyone still chooses Claude. Codex outperforms it in most respects, and its quotas are about ten times larger. A $100 Codex plan gets me through the whole week with 6–12 hours of coding per day.

jjice1mo ago

I found GPT 5.5 is pretty solid, but I keep getting impressed by opus. It's tracked down some insane stuff while I look away during a meeting. 5.5 is way closer than previous OpenAI models to Anthropic IMO.

These things are so tricky because everyone has a seemingly conflicting experience. Part of the fun I guess!

SatvikBeri1mo ago

I've never actually run into the issues that people talk about online, like Claude suddenly getting dumb or running out of usage. So there's just not a lot of incentive for me to shop around. I've used Amp a bit, and it's quite nice, but a bit more expensive without the subsidized subscription.

raincole1mo ago

It has always been like this. We actually know that the model performance has been mostly steady[0], but you cannot beat the notion of "evil companies secretly serving us worse models." The meme value is too strong.

[0]: https://marginlab.ai/trackers/claude-code/

QuantumGood1mo ago

Your data support actual strength shifts, not narrative manipulation:

Range of 48-73.5 (peak 53.1+% higher than trough) with a single day shift of ~30%.

You suggest people are usually influenced more by narrative than data, but provide a narrative-heavy, data-light comment, e.g. "always" "know" "mostly steady" (hazy terms for data) "cannot beat" "evil companies" "meme strong".

A followup defining "mostly" and "steady" more clearly, and your purpose in writing in a narrative-shaping style would be helpful.

mnicky1mo ago

Hmm, today's pass rate raised to 73% - interesting, are they AB-testing some new model? This is too high for Opus 4.7.

gardnr1mo ago

Are you using Opus? Sonnet remains as useful as it was while Opus efficacy and token burn rate has soured over the last 4 months.

fny1mo ago

I'm using Opus on xhigh 10+ hours a day, and I've only reached 80% of weekly limits when doing massive ports or refactors. I haven't once hit hourly limits, and I've used Claude very, very aggressively. I guess its a pain point for power users.

1 more reply

SatvikBeri1mo ago

Yes, I've pretty much used Opus exclusively for the last year, except for a brief period when Sonnet was ahead

mbreese1mo ago

When do you use it the most? I’ve noticed that it most often starts to degrade during 10-5 US East coast time. Late at night, I have the least amount of issues, but without fail, if I’m trying to do anything complex during the day, Claude gets loopy.

SatvikBeri1mo ago

9-5 Pacific Time

dboreham1mo ago

Same here. Works every time. Never ran into usage limits either.

elahieh1mo ago

One reason might be that Claude Opus 4.7 thinking benchmarks better on Arena Coding at https://arena.ai/leaderboard/text/coding ... hopefully that effectively assesses correctness. It doesn't account for reliability though.

hansvm1mo ago

Claude is the only AI coding tool I've found worth a damn. Without it I'd just do everything by hand save for a few bash scripts or whatever.

arcanemachiner1mo ago

Have you tried other harnesses, such as OpenCode?

hansvm1mo ago

Yeah, harness quality matters too, but the underlying model capabilities are night and day.

xboxnolifes1mo ago

I certainly get more usage before cutoff from GPT 5.5, but the output I get from Opus 4.7 is way better. It just sucks that I get 2 good "long running" prompts on Opus 4.7 before my daily quota is met on the $20 subscription.

Thaxll1mo ago

I think it's impossible to say that codex x.y.z is better than Sonnet x.y.z, I used many "high" end models and they're just all good.

etchalon1mo ago

I'd rather not give money to Sam Altman.

beering1mo ago

with Anthropic you’re giving money to Elon Musk. Seems like a pick-your-billionaire world we’re in now

etchalon1mo ago

I can't chose what the people I give money to do with it, just who I chose to give money to.

I refuse to accept the lazy cynicism of "nothing matters at all".

wahnfrieden1mo ago

Claude is (per benchmarks) much worse at instruction following, but is more charming and deceptive and anthropomorphized by default (in name and image), leading to productivity assessment psychosis

kylemaxwell1mo ago

Corporate policies and agreements. In large corporations, using external non-approved models with proprietary source code is a good way to have significant career issues.

echelon1mo ago

Claude is significantly better at Rust in my experience, and Rust is my favorite language to emit from LLMs.

Opus 4.7 + Rust is a killer combo.

SeanAnderson1mo ago

You get a discount for paying for a full year on Teams and Enterprise can involve contractual obligations. It's a lot of effort to get buy-in to change providers and to shift an entire organization. The winds change frequently in this space and the pain needs to get to a certain level before it's worth rolling the dice.

taspeotis1mo ago

Claude Max 20x gives me unlimited (for my level of usage) Opus 4.7 - how much money do I have pay OpenAI for that?

arcanemachiner1mo ago

Based on the experience of people using the $20 Claude Pro subscription and exhausting their quotas in a manner of minutes, the answer to your question is probably "less". (I would guess that the $100 plan would do the trick.)

taspeotis1mo ago

Okay so how much less will I have to pay OpenAI for unlimited Opus 4.7?

squirrellous1mo ago

Corporate reasons. AWS hasn't opened codex models to everyone yet.

CompoundEyes1mo ago

In my org the teams doing agent engineering at scale are all on Codex using gpt-5.5. By scale I mean fully agent authored code workflows with long running / multi hour plans.

yieldcrv1mo ago

because my shard isn’t erroring

I use Codex when Claude Code is down, and I only began using Claude when ChatGPT was down

yes codex is very fast, I go back to Claude for now

atraac1mo ago

But 100$ Claude subscription also gets me easily entire week of coding 6-8 hours a day? What on earth do you do to run out of limits on Max? Do you vibe multiple new codebases every day for a living? The benefit of Claude is also not gaslighting me every time I tell it it's wrong.

nothinkjustai1mo ago

Because of marketing and vibes mostly.

Heck I prefer DeepSeek to both of those.

josephg1mo ago

Wow, I'm really surprised. I tried deepseek (their best model, through the official API). Its extremely cheap, but its clearly not as good at programming as Opus 4.7. It seems nowhere near as good at making high level design choices. Deepseek also seems to get stuck in whack-a-mole fixing loops much more than opus. I stopped it at one point, and asked opus to solve the problem it was trying to solve and it saw the solution immediately.

I was running deepseek through claude's code agent harness. Maybe it works better through a different tool?

zmmmmm1mo ago

I've given V4 Pro some curly things and I was impressed at how it figured them out. I agree high level design is not its forte. But it sat in a loop and dogmatically debugged a crazy dependency issue to come to the right answer over the course of 15 minutes which impressed me.

esafak1mo ago

You tried v4?

2 more replies

nothinkjustai1mo ago

Idk, I don’t vibe code so even the flash model is great for generating code for myself. I tend to do the planning and design myself though.

Harness also matters, and also provider. I was using openrouter and switched to the Deepseek api and suddenly all the tool call issues I was having resolved themselves. Flash is so damn fast at doing stuff like generating boilerplate I can’t go back to the bigger slower models.

mcv1mo ago

I feel you. I'd prefer to stick entirely with local open source models. I tried using Aider and Qwen last week, and while it's still impressive what it can do with just local resources and entirely for free, its error rate is too high, and it's clearly not remotely in the same league as Claude Code.

zmmmmm1mo ago

interestingly I had the same experience, and weirdly it's in part because it is clearly less intelligent. It's more of a mechanistic tool just doing what I ask (but still very smart and very competent about it) and less trying to win a nobel prize with each answer. Turns out I actually like that.

FergusArgyll1mo ago· 2 in thread

I thought the deal with xai was supposed to solve this? Is this basically the adding lanes paradox?

josephg1mo ago

You're assuming the elevated error rates are due to the system being overloaded. We have no evidence this is actually the case. Its much more likely due to a simple misconfiguration or failing router or something.

cr125rider1mo ago

The incredible infrastructure required to coordinate warehouses worth of compute actually seems pretty tricky. They’re worth more money than god so they get 0 leniency, but it does seem hard.

textlapse1mo ago· 1 in thread

Say what you will about Sam Altman, but at least he engages with his user base and acts on user feedback.

Dario and co seem to be on some elevated pedestal - us mere mortals are beneath them - and they have this scattershot devrel where each engineer has their own X way of communicating to the public often at odds with each other.

I loved Sonnet and Opus fwiw but not anymore.

SilverElfin1mo ago

Plus I can’t really trust someone who emphasizes ethics and then partners with Elon to buy compute from a potentially illegal natural gas powered datacenter.

cyanydeez1mo ago· 1 in thread

so, all those CEOs moving all those remaining engineers to be dependent on a cloud service to the extent that there's no local development capability are gonna appologize right

claaams1mo ago

in a year or two when AI tool costs go from 5M per year to 15M per year...even then, maybe not.

gopalv1mo ago

Sonnet is also throwing overloaded error.

My systems are hitting exponential delay retries, so this might not get better because retries overload things again.

> {'type': 'error', 'error': {'details': None, 'type': 'overloaded_error', 'message': 'Overloaded'}, 'request_id': 'req_ ...

I can see a weird spike in my cache hit-rate a few minutes before, so this might actually be some extra caching they have thrown in.

keithnz1mo ago

https://status.claude.com/

kristianc1mo ago

They're having quite the day for devrel..

9cb14c1ec01mo ago

Do they need a waiting list, or what?

sinak1mo ago

Sonnet is giving an overloaded message as well.

imperio591mo ago

I love Claude but I hate waiting a minute or two for any inference to start. I hope they can get their xAI capacity online ASAP and that it helps!

j / k navigate · click thread line to collapse

56 comments

52 comments · 10 top-level

lepuski1mo ago· 38 in thread

jjice1mo ago

These things are so tricky because everyone has a seemingly conflicting experience. Part of the fun I guess!

SatvikBeri1mo ago

raincole1mo ago

[0]: https://marginlab.ai/trackers/claude-code/

QuantumGood1mo ago

Your data support actual strength shifts, not narrative manipulation:

Range of 48-73.5 (peak 53.1+% higher than trough) with a single day shift of ~30%.

A followup defining "mostly" and "steady" more clearly, and your purpose in writing in a narrative-shaping style would be helpful.

mnicky1mo ago

Hmm, today's pass rate raised to 73% - interesting, are they AB-testing some new model? This is too high for Opus 4.7.

gardnr1mo ago

Are you using Opus? Sonnet remains as useful as it was while Opus efficacy and token burn rate has soured over the last 4 months.

fny1mo ago

1 more reply

SatvikBeri1mo ago

Yes, I've pretty much used Opus exclusively for the last year, except for a brief period when Sonnet was ahead

mbreese1mo ago

SatvikBeri1mo ago

9-5 Pacific Time

dboreham1mo ago

Same here. Works every time. Never ran into usage limits either.

elahieh1mo ago

hansvm1mo ago

Claude is the only AI coding tool I've found worth a damn. Without it I'd just do everything by hand save for a few bash scripts or whatever.

arcanemachiner1mo ago

Have you tried other harnesses, such as OpenCode?

hansvm1mo ago

Yeah, harness quality matters too, but the underlying model capabilities are night and day.

xboxnolifes1mo ago

Thaxll1mo ago

I think it's impossible to say that codex x.y.z is better than Sonnet x.y.z, I used many "high" end models and they're just all good.

etchalon1mo ago

I'd rather not give money to Sam Altman.

beering1mo ago

with Anthropic you’re giving money to Elon Musk. Seems like a pick-your-billionaire world we’re in now

etchalon1mo ago

I can't chose what the people I give money to do with it, just who I chose to give money to.

I refuse to accept the lazy cynicism of "nothing matters at all".

wahnfrieden1mo ago

Claude is (per benchmarks) much worse at instruction following, but is more charming and deceptive and anthropomorphized by default (in name and image), leading to productivity assessment psychosis

kylemaxwell1mo ago

Corporate policies and agreements. In large corporations, using external non-approved models with proprietary source code is a good way to have significant career issues.

echelon1mo ago

Claude is significantly better at Rust in my experience, and Rust is my favorite language to emit from LLMs.

Opus 4.7 + Rust is a killer combo.

SeanAnderson1mo ago

taspeotis1mo ago

Claude Max 20x gives me unlimited (for my level of usage) Opus 4.7 - how much money do I have pay OpenAI for that?

arcanemachiner1mo ago

taspeotis1mo ago

Okay so how much less will I have to pay OpenAI for unlimited Opus 4.7?

squirrellous1mo ago

Corporate reasons. AWS hasn't opened codex models to everyone yet.

CompoundEyes1mo ago

In my org the teams doing agent engineering at scale are all on Codex using gpt-5.5. By scale I mean fully agent authored code workflows with long running / multi hour plans.

yieldcrv1mo ago

because my shard isn’t erroring

I use Codex when Claude Code is down, and I only began using Claude when ChatGPT was down

yes codex is very fast, I go back to Claude for now

atraac1mo ago

nothinkjustai1mo ago

Because of marketing and vibes mostly.

Heck I prefer DeepSeek to both of those.

josephg1mo ago

I was running deepseek through claude's code agent harness. Maybe it works better through a different tool?

zmmmmm1mo ago

esafak1mo ago

You tried v4?

2 more replies

nothinkjustai1mo ago

Idk, I don’t vibe code so even the flash model is great for generating code for myself. I tend to do the planning and design myself though.

mcv1mo ago

zmmmmm1mo ago

FergusArgyll1mo ago· 2 in thread

I thought the deal with xai was supposed to solve this? Is this basically the adding lanes paradox?

josephg1mo ago

cr125rider1mo ago

The incredible infrastructure required to coordinate warehouses worth of compute actually seems pretty tricky. They’re worth more money than god so they get 0 leniency, but it does seem hard.

textlapse1mo ago· 1 in thread

Say what you will about Sam Altman, but at least he engages with his user base and acts on user feedback.

I loved Sonnet and Opus fwiw but not anymore.

SilverElfin1mo ago

Plus I can’t really trust someone who emphasizes ethics and then partners with Elon to buy compute from a potentially illegal natural gas powered datacenter.

cyanydeez1mo ago· 1 in thread

so, all those CEOs moving all those remaining engineers to be dependent on a cloud service to the extent that there's no local development capability are gonna appologize right

claaams1mo ago

in a year or two when AI tool costs go from 5M per year to 15M per year...even then, maybe not.

gopalv1mo ago

Sonnet is also throwing overloaded error.

My systems are hitting exponential delay retries, so this might not get better because retries overload things again.

> {'type': 'error', 'error': {'details': None, 'type': 'overloaded_error', 'message': 'Overloaded'}, 'request_id': 'req_ ...

I can see a weird spike in my cache hit-rate a few minutes before, so this might actually be some extra caching they have thrown in.

keithnz1mo ago

https://status.claude.com/

kristianc1mo ago

They're having quite the day for devrel..

9cb14c1ec01mo ago

Do they need a waiting list, or what?

sinak1mo ago

Sonnet is giving an overloaded message as well.

imperio591mo ago

I love Claude but I hate waiting a minute or two for any inference to start. I hope they can get their xAI capacity online ASAP and that it helps!

j / k navigate · click thread line to collapse