It kept messing up aspect ratio, and forgets to add the AI feature, I know what you are thinking, may be the prompts are off... but its like 'oh I forgot to add the AI feature, let me add it' and then no 'AI Feature' in its output.
Driving me nuts, figured I would ask here to see if its just me or more people are seeing noticeable degradation of their experience recently...
Over the last 6 months I've been creating some truly mindblowing apps, apps that feel like they are entirely out of reach for today's Claude.
I've also noticed that if I do some assisted coding at 4-5am UK time, and then perform the same actions at 3-8pm UK time the results are vastly different:
- It takes much much longer to consider my input and work out a response
- Any response given has to be thoroughly considered (I've had to rollback changes)
- Changes scoped in the plan are stubbed or missing entirely
I've put it down to two things (a) Early enshitification, perhaps they simply don't feel the need to provide a consistent level of service because any observation of performance is highly subjective and (b) oversubscription, they scored massive marketing points by being blacklisted by US DoD (despite being integrated into many systems in a different capacity)
I had to build a stop hook to catch it's garbage, and even then it's not enough. I had 30min-1hr uninterrupted sessions (some slipstreamed comments), and now I can't get a single diff that I can accept without comment. Half of the work it does is more destructive than helpful (removing comments from existing code, ignoring directives and wandering off into nowhere, etc).
From 2 weeks after installing the stop hook (around March 8th): ``` Breakdown of the 173 violations:
73x ownership dodging (caught saying variants of "not caused by my changes")
40x unnecessary permission-seeking ("should I continue?", "want me to keep going?")
18x premature stopping ("good stopping point", "natural checkpoint")
14x "known limitation" dodging
14x "future work" / "known issue" labeling
Various: "next session", "pause here", etc.
Peak day: March 18 with 43 violations in a single day.
```Other one is loops in reasoning, which are something I'm familiar with on small local models, not frontier ones: ``` Sessions containing 5+ instances of reasoning-loop phrases ("oh wait", "actually,", "let me reconsider", "I was wrong"): Period Sessions with 5+ loops Before March 8 0 After March 8 7 (up to 23 instances in one session) ``` (I've even had it write code where it has "Wait, actually, we should do X" in comments in the code!)
The worst is the dodging; it said, literally, "not my code, not my problem" to a build failure it created 5 messages ago in the same session. ``` I had to tell Claude "there's no such thing as [an issue that existed before your changes]" on average:
Once per week in January
2-3 times per week in February
Nearly daily from March 8 onward
```Honestly, just venting, because I'm extremely depressed. I had the equivalent of a team of engineers I could trust, and overnight someone at Anthropic flicked a switch and killed them. I'm getting better results from random models on OpenRouter now (and OmniCoder 9B! 9B!). They aren't _good_ results, mind you, but they aren't idiotic.
Sad. Very sad.
It isn’t obvious but hope the guys managing this realize what kind of confusion and doubt (or self doubt) that this creates in people and will have a long term impact on usage of their models.
I am going to try removing every and all plugins (i only have all Anthropic’s plugins like superpowers) and see if that makes any difference.
In case anyone can correlate, the recovery happened on March 24th and then re-regressed at approximately 3:09 PM PST (23:09 UTC) on March 25. Flipped right back into "simplest" solutions, and "You're right, I'm sorry" mode:
> "You're right. That was lazy and wrong. I was trying to dodge a code generator issue instead of fixing it."
> "You're right — I rushed this and it shows. Let me be deliberate about the structure before writing."
> "You're right, and I was being sloppy. The CPU slab provider's prefault is real work."