I cancelled Claude: Token issues, declining quality, and poor support (opens in new tab)

(nickyreinert.de)

971 pointsy421mo ago582 comments

582 comments

221 comments · 108 top-level

wg01mo ago· 24 in thread

I write detailed specs. Multifile with example code. In markdown.

Then hand over to Claude Sonnet.

With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.

So turns out that I'm not writing code but I'm reading lots of code.

The fact that I know first hand prior to Gen AI is that writing code is way easier. It is reading the code, understanding it and making a mental model that's way more labour intensive.

Therefore I need more time and effort with Gen AI than I needed before because I need to read a lot of code, understand it and ensure it adheres to what mental model I have.

Hence Gen AI at this price point which Anthropic offers is a net negative for me because I am not vibe coding, I'm building real software that real humans depend upon and my users deserve better attention and focus from me hence I'll be cancelling my subscription shortly.

gwerbin1mo ago

Or just don't use AI to write code. Use it as a code reviewer assistant along with your usual test-lint development cycle. Use it to help evaluate 3rd party libraries faster. Use it to research new topics. Use it to help draft RFCs and design documents. Use it as a chat buddy when working on hard problems.

I think the AI companies all stink to high heaven and the whole thing being built on copyright infringement still makes me squirm. But the latest models are stupidly smart in some cases. It's starting to feel like I really do have a sci-fi AI assistant that I can just reach for whenever I need it, either to support hard thinking or to speed up or entirely avoid drudgery and toil.

You don't have to buy into the stupid vibecoding hype to get productivity value out of the technology.

You of course don't have to use it at all. And you don't owe your money to any particular company. Heck for non-code tasks the local-capable models are great. But you can't just look at vibecoding and dismiss the entire category of technology.

2 more replies

Aurornis1mo ago

Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.

That's vibecoding with an extra documentation step.

Also, Sonnet is not the model you'd want to use if you want to minimize cleanup. Use the best available model at the time if you want to attempt this, but even those won't vibecode everything perfectly for you. This is the reality of AI, but at least try to use the right model for the job.

> Therefore I need more time and effort with Gen AI than I needed before

Stop trying to use it as all-or-nothing. You can still make the decisions, call the shots, write code where AI doesn't help and then use AI to speed up parts where it does help.

That's how most non-junior engineers settle into using AI.

Ignore all of the LinkedIn and social media hype about prompting apps into existence.

EDIT: Replaced a reference to Opus and GPT-5.5 with "best available model at the time" because it was drawing a lot of low-effort arguments

8 more replies

scuderiaseb1mo ago

I must be doing something very different from everyone else, but I write what I want and how I want it and Opus 4.7 plans it for me, then I carefully review. Often times I need to validate and check things, sometimes I’ve revised the plan multiple times. Then implementation which I still use Opus for because I get a warning that my current model holds the cache so Sonnet shouldn’t implement. And honestly, I’m mostly within my Pro subscription, granted I also have ChatGPT Plus but I’ve mostly only used that as the chat/quick reference model. But yeah takes some time to read and understand everything, a lot of the time I make manual edits too.

2 more replies

hintymad1mo ago

> With hard requirements listed, I found out that the generated code missed requirements,

This is hardly a surprise, no? No matter how much training we run, we are still producing a generative model. And a generative model doesn't understand your requirements and cross them off. It predicts the next most likely token from a given prompt. If the most statistically plausible way to finish a function looks like a version that ignores your third requirement, the model will happily follow through. There's really no rules in your requirements doc. They are just the conditional events X in a glorified P(Y|X). I'd venture to guess that sometimes missing a requirement may increase the probability of the generated tokens, so the model will happily allow the miss. Actually, "allow" is too strong a word. The model does not allow shit. It just generates.

1 more reply

coldtea1mo ago

>I write detailed specs. Multifile with example code. In markdown. Then hand over to Claude Sonnet. With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.

Stop doing that. Micromanage it instead. Don't give it the specs for the system, design the system yourself (can use it for help doing that), inform it of the general design, but then give it tasks, ONE BY ONE, to do for fleshing it out. Approve each one, ask for corrections if needed, go to the next.

Still faster than writing each of those parts yourself (a few minutes instead of multiple hours), but much more accurate.

1 more reply

bmurphy19761mo ago

I'm starting to think a lot of the problem people are having is just that they have unrealistic expectations.

I'm not having the same problem as you and I follow a very similar methodology. I'm producing code faster and at much higher quality with a significant reduction in strain on my wrists. I doubt I'm typing that much less, but what I am typing is prose which is much more compatible with a standard QWERTY keyboard.

I think part of it is that I'm not running forward as fast as I can and I keep scope constrained and focused. I'm using the AI as a tool to help me where it can, and using my brain and multiple decades of experience where it can't.

Maybe you're expecting too much and pushing it too hard/fast/prematurely?

I don't find the code that hard to read, but I'm also managing scope and working diligently on the plans to ensure it conforms to my goals and taste. A stream of small well defined and incremental changes is quite easy to evaluate. A stream of 10,000 line code dumps every day isn't.

I bet if you find that balance you will see value, but it might not be as fast as you want, just as fast as is viable which is likely still going to be faster than you doing it on your own.

1 more reply

rsanek1mo ago

I'm confused. If you have detailed, specific expectations, why aren't using the best model available? Even if you were using Opus 4.7, I would inquire if you're using high/xhigh effort by default.

Feels crazy to me for people to use anything other than the best available.

2 more replies

linsomniac1mo ago

>Then hand over to Claude Sonnet.

Have you tried Opus 4.6 with "/effort max" in Claude Code? That's pretty much all I use these days, and it is, honestly, doing a fantastic job. The code it's writing looks quite good to me. Doesn't seem to matter if it's greenfield or existing code.

If code is harder to read than to write, you're doing yourself a disservice by having the output stage not be top shelf.

1 more reply

jwpapi1mo ago

I have the same feeling.

Like there is no way in world that Gen AI is faster then an actual cracked coder shooting the exact bash/sql commands he needs to explore and writing a proper intent-communicating abstraction.

I’m thinking the difference is in order of magnitudes.

On top of that it adds context loss, risk of distraction, the extra work of reading after the job is done + you’ll have less of a mental model no matter how good you read, because active > passive.

Man it was really the weirdest thing that Claude Coded started hiding more and more changes. Thats what you need, staying closely on the loop.

eweise1mo ago

I give Claude small incremental tasks to do and it usually does them flawlessly. I know how to design the software and break into incremental tasks. Claude does the work. The productivity increase has been incredible. I think I'll be able to bootstrap a single person lifestyle business just using Claude.

hirvi741mo ago

That is why I still use the Chatbots and not the CLI/desktop tools. I am in 100% control. I mainly ask question surrounding syntax with languages I am not well experienced in, snippets/examples, and sometimes feedback on certain bits of logic.

I feel like I have easily multiplied my productivity because I do not really have to read more than a single chat response at a time, and I am still familiar with everything in my apps because I wrote everything.

I've been working on Window Manager + other nice-to-haves for macOS 26. I do not need a model to one-shot the program for me. However, I am thrilled to get near instantaneous answers to questions I would generally have to churn through various links from Google/StackOverflow for.

throwaway77831mo ago

I don't know. I don't write detailed specs, but make it very iterative, with two sessions. One for coding and one for reviews at various levels.

Just the coding window makes mistakes, duplicates code, does not follow the patterns. The reviewer catches most of this, and the coder fixes them all after rationalizing them.

Works pretty well for me. This model is somewhat institutionalized in my company as well.

I use CC Opus 4.7 or Codex GPT 5.4 High (more and more codex off late).

meroes1mo ago

This is how I feel with AI math proofs. I’m not sure where they’re at now, but a year ago it took so much more time to check if an LLM proof was technically correct even if hard to understand, compared to a well structured human proof.

Maybe it was Timothy Gowers who commented on this.

Lots of human proofs have the unfortunate “creative leap” that isn’t fully explained but with some detectable subtlety. LLMs end up making large leaps too, but too often the subtle ways mathematicians think and communicate is lost, and so the proof becomes so much more laborious to check.

Like you don’t always see how a mathematician came up with some move or object to “try”, and to an LLM it appears random large creative leaps are the way to write proofs.

varispeed1mo ago

You can quickly get something "working" until you realise it has a ton of subtle bugs that make it unusable in the long run.

You then spend months cleaning it up.

Could just have written it by hand from scratch in the same amount of time.

But the benefit is not having to type code.

abustamam1mo ago

This may be a bit silly but I do what you do and then I tell Claude to review the code it wrote and compare it to the specs. It will often find issues and fix it. Then I review the reviewed code, and it's leagues better than pre reviewed code.

This may be worth trying out.

baranul1mo ago

Now that there is Claw Code[1], seems like many of these cancellations are easier to do.

[1]: https://github.com/ultraworkers/claw-code

arikrahman1mo ago

I use open spec to negotiate requirements before the handoff, it's helped me a lot. You could also use GSD2 or Amazon's Kiro, or Spec Kit but I find they have too many stages and waste tokens.

CamperBob21mo ago

Then hand over to Claude Sonnet.

Well, there's your problem. Why aren't you using the best tool for the job?

moribunda1mo ago

And it leaves 25 TODO comments in code silently, reporting to you that everything is done.

dannersy1mo ago

Beautifully stated and I couldn't agree more. This is my experience.

GoToRO1mo ago

you are holding it wrong. For real this time.

rob1mo ago

I use the "Superpowers" plugin that creates an initial spec via brainstorming together, and then takes that spec and creates an implementation spec file based on your initial spec. It also has other agents make sure the spec doesn't drift between those two stages and does its own self-reviews. Almost every time, it finds and fixes a bunch of self-review issues before writing the final plan. Then I take that final plan and run it through the actual execution phase that does its own reviews after everything.

Just saying that I know a lot of people like to raw dog it and say plugins and skills and other things aren't necessary, but in my case I've had good success with this.

tengbretson1mo ago

> or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed)

Dude! The amount of ad-hoc, interface-specific DTOs that LLM coding agents define drives me up the wall. Just use the damn domain models!

xpe1mo ago

I very much value and appreciate the first four paragraphs! [3] This is my favorite kind of communication in a social setting like this: it reads more like anthropology and less like judgment or overgeneralization.

The last two paragraphs, however, show what happens when people start trying to use inductive reasoning -- and that part is really hard: ...

> Therefore I need more time and effort with Gen AI than I needed before because I need to read a lot of code, understand it and ensure it adheres to what mental model I have.

I don't disagree that the above is reasonable to say. But it isn't all -- not even enough -- about what needs to be said. The rate of change is high, the amount of adaptation required is hard. This in a nutshell is why asking humans to adapt to AI is going to feel harder and harder. I'm not criticizing people for feeling this. But I am criticizing the one-sided-logic people often reach for.

We have a range of options in front of us:

    A. sharing our experience with others
    B. adapting
    C. voting with your feet (cancelling a subscription)
    D. building alternatives to compete
    E. organizing at various levels to push back

(A) might start by sounding like venting. Done well it progresses into clearer understanding and hopefully even community building towards action plans: [1]

> Hence Gen AI at this price point which Anthropic offers is a net negative for me because I am not vibe coding, I'm building real software that real humans depend upon and my users deserve better attention and focus from me hence I'll be cancelling my subscription shortly.

The above quote is only valid unless some pretty strict (implausible) assumptions: (1) "GenAI" is a valid generalization for what is happening here; (2) Person cannot learn and adapt; (2) The technology won't get better.

[1]: I'm at heart more of a "let's improve the world" kind of person than "I want to build cool stuff" kind of person. This probably causes some disconnect in some interactions here. I think some people primarily have other motives.

Some people cancel their subscriptions and kind of assume "the market and public pushback will solve this". The market's reaction might be too slow or too slight to actually help much. Some people put blind faith into markets helping people on some particular time scales. This level of blind faith reminds me of Parable of the Drowning Man. In particular, markets often send pretty good signals that mean, more or less, "you need to save yourself, I'm just doing my thing." Markets are useful coordinating mechanisms in the aggregate when functioning well. One of the best ways to use them is to say "I don't have enough of a cushion or enough skills to survive what the market is coordinating" so I need a Plan B!

Some people go further and claim markets are moral by virtue of their principles; this becomes moral philosophy, and I think that kind of moral philosophy is usually moral confusion. Broadly speaking, in practice, morality is a complex human aspiration. We probably should not not abdicate our moral responsibilities and delegate them to markets any more than we would say "Don't worry, people who need significant vision correction (or other barrier to modern life)... evolution will 'take care' of you."

One subscription cancellation is a start (if you actually have better alternative and that alternative being better off for the world ... which is debatable given the current set of alternatives!)

Talking about it, i.e. here on HN might one place to start. But HN is also kind of a "where frustration turns into entertainment, not action" kind of place, unfortunately. Voting is cheap. Karma sometimes feels like a measure of conformance than quality thinking. I often feel like I am doing better when I write thoughtfully and still get downvotes -- maybe it means I got some people out of their comfort zone.

Here's what I try to do (but fail often): Do the root cause analysis, vent if you need to, and then think about what is needed to really fix it.

[2]: https://en.wikipedia.org/wiki/Parable_of_the_drowning_man

[3]: The first four are:

    I write detailed specs. Multifile with example code. In markdown.

    Then hand over to Claude Sonnet.

    With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.

    So turns out that I'm not writing code but I'm reading lots of code.

janwillemb1mo ago· 13 in thread

This is what worries me. People become dependent on these GenAI products that are proprietary, not transparant, and need a subscription. People build on it like it is a solid foundation. But all of a sudden the owner just pulls the foundation from under your building.

jjfoooo41mo ago

But these products are all drop in replacements for each other. I've recently favored Codex more than CC, just because rate limits got mildly annoying. I really didn't have to change anything about my workflow in doing that.

1 more reply

SwellJoe1mo ago

At least some of the investors in this tech are hoping for a monopoly position. They'd like to outspend the competition to get an insurmountable lead, at which point they can set their price.

But, so far, competition remains fierce. Anthropic still has the best tools for writing code. That lead is smaller than it's ever been, though. But, honestly, Opus 4.5 is when it got Good Enough. If Anthropic suddenly increased prices beyond what I'm willing to pay, any model that gives me Opus 4.5 or better performance is good enough for the vast majority of the work I do with agents. And, there are a bunch of models at that level, now maybe including some discount Chinese models. Certainly Gemini Pro 3.1 is on par with Opus 4.5. Current Codex is better than Opus 4.5 and close to Opus 4.7 (though I won't use OpenAI because I don't trust them to be the dominant player in AI).

I often switch agents/models on the same project because I like tinkering with self-hosted and I like to keep an eye on the most efficient way to work...which models wastes less of my time on silly stuff. Switching is literally nothing; I run `gemini` or `copilot` or `hermes` instead of `claude`. There's simply no deep dependency on a specific model or agent. They're all trying to find ways to make unique features for people to build a dependence on, of course, but the top models are all so fucking smart you can just tell them to do whatever thing it is that you need done. That feature could probably be a skill, whatever it is, and the model can probably write the skill. Or, even better, it could be actual software, also written by the model, rather than a set of instructions for the model to interpret based on the current random seed.

Currently, the only consistent moat is making the best model. Anthropic makes the best model and tools for coding, but that's a pretty shallow moat...I could live with several other models for coding. I'll gladly pay a premium for the best model and tools for coding, but I also won't be devastated if I suddenly don't have Claude Code tomorrow. Even open models I can host myself are getting very close to Good Enough.

GaryBluto1mo ago

Luckily local AI is becoming more feasible every day.

8 more replies

fortyseven1mo ago

This is why, despite enjoying all of this, I really want to focus on locally hosted models. If we don't host the technology ourselves, we're setting ourselves up for a hard fall down the line.

Until very recently, local models been little more than brittle toys in my experience, if you're trying to use them for coding.

But lately I've been running Pi (minimal coding agent harness) with Gemma4 and Qwen3.6 and I've been blown away by how capable and fast they are compared to other models of their size. (I'm using the biggest that can fit into 24gb, not the smaller ones.) In fact, I don't really need to reach for Claude and friends much of the time (for my use cases at least).

gip1mo ago

True. That is why it is key important to have open source and sovereign models that will be accessible to all and always on / local.

Competition (OpenAI vs Anthropic is fun to watch) and open source will get us there soon I think.

tetha1mo ago

The owner rug-pulls, or Broadcom buys the owner and starts squeezing.

pmarsh1mo ago

For the sake of argument if you build on AWS is that any more of a solid foundation? You're beholden to Amazon, unless you have the bandwidth to be able to DR immediately to another provider.

1 more reply

blueone1mo ago

Anthropic sells due to unrelenting pressure and unachievable demand > new owner cuts costs > models become worse > new owner sells > the capitalistic cycle wins > we, the people, suffer

sdevonoes1mo ago

The sooner you cancel the sooner you become independent of them

1 more reply

_the_inflator1mo ago

“In the future there might be the possibility that catastrophic event A could happen.”

Not the best argument.

Also there is nothing without dependencies. Loose coupling means coupling.

agumonkey1mo ago

Some people are so dependent on it they can't even say it without twisting words to hide the fact that they're now stuck at zero

notjes1mo ago

Soon, a dented toaster will be enough to run decent models.

2ndorderthought1mo ago

Imagine if anthropic and openai went bankrupt in the next 2 years. If you look at their financials its a real possibility.

zkmon1mo ago· 8 in thread

Yesterday was a realization point for me. I gave a simple extraction task to Claude code with a local LLM and it "whirred" and "purred" for 10 minutes. Then I submitted the same data and prompt directly to model via llama_cpp chat UI and the model single-shotted it in under a minute. So obviously something wrong with coding agent or the way it is talking to LLM.

Now I'm looking for an extremely simple open-source coding agent. Nanocoder doesn't seem install on my Mac and it brings node-modules bloat, so no. Opencode seems not quite open-source. For now, I'm doing the work of coding agent and using llama_cpp web UI. Chugging it along fine.

syhol1mo ago

https://pi.dev/ seems popular, whats not open source about opencode? The repo has an MIT License.

4 more replies

SyneRyder1mo ago

Probably a silly idea, but I'll throw it into the mix - have your current AI build one for you. You can have exactly the coding agent you want, especially if you're looking for "extremely simple".

I got annoyed enough with Anthropic's weird behavior this week to actually try this, and got something workable up & running in a few days. My case was unique: there's no Claude Code for BeOS, or my older / ancient Macs, so it was easier to bootstrap & stitch something together if I really wanted an agentic coding agent on those platforms. You'll learn a lot about how models actually work in the process too, and how much crazy ridiculous bandaid patching is happening Claude Code. Though you might also appreciate some of the difficulties that the agent / harnesses have to solve too. (And to be clear, I'm still using CC when I'm on a platform that supports it.)

As for the llama_cpp vs Claude Code delays - I've run into that too. My theory is API is prioritized over Claude Code subscription traffic. API certainly feels way faster. But you're also paying significantly more.

appcustodian21mo ago

Just in case it didn't occur to you already, you can just build whatever coding agent you want. They're pretty simple

jedisct11mo ago

Swival is not bloated and was specifically made for local agents: https://swival.dev

1 more reply

enraged_camel1mo ago

I use both Cursor and Claude Code, and yes, the latter is noticeably slower with the same model at the same settings.

However, it's hard to justify Cursor's cost. My bill was $1,500/mo at one point, which is what encouraged me to give CC a try.

btbuildem1mo ago

You'd figure by now we would have something between a TUI and an IDE.

btbuildem1mo ago

You can run CC with local models, it's pretty straightforward. I've done this with vLLM + a thin shim to change the endpoint syntax.

banditelol1mo ago

what model you used with llama_cpp?

1 more reply

rectang1mo ago· 7 in thread

I feel like I'm using Claude Opus pretty effectively and I'm honestly not running up against limits in my mid-tier subscriptions. My workflow is more "copilot" than "autopilot", in that I craft prompts for contained tasks and review nearly everything, so it's pretty light compared to people doing vibe coding.

The market-leading technology is pretty close to "good enough" for how I'm using it. I look forward to the day when LLM-assisted coding is commoditized. I could really go for an open source model based on properly licensed code.

Retr0id1mo ago

I also use it this way and I'm overall pretty happy with it, but it feels like they really want us to use it in "autopilot" mode. It's like they have two conflicting priorities of "make people use more tokens so we can bill them more" and "people are using more tokens than expected, our pricing structure is no longer sustainable"

(but I guess they're not really conflicting, if the "solution" involves upgrading to a higher plan)

3 more replies

llm_nerd1mo ago

I have Max 5x and use only Claude Opus on xhigh mode. I don't use agents, or even MCPs, and stick to Claude Code.

I find it incredibly difficult to saturate my usage. I'm ending the average week at 30-ish percentage, despite this thing doing an enormous amount of work for (with?) me.

Now I will say that with pro I was constantly hitting the limit -- like comically so, and single requests would push me over 100% for the session and into paying for extra usage -- and max 5x feels like far more than 5x the usage, but who knows. Anthropic is extremely squirrely about things like surge rates, and so on.

I'm super skeptical of the influx of "DAE think Opus sucks now. Let's all move to Codex!" nonsense that has flooded HN. A part of it is the ex-girlfriend thing where people are angry about something and try to force-multiply their disagreement, but some of it legitimately smells like astroturfing. Like OpenAI got done pay $100M for some unknown podcaster and start hiring people to write this stuff online.

3 more replies

raincole1mo ago

> the day when LLM-assisted coding is commoditized

Like yesterday? LLM-assisted coding is $100/mo. It looks very commoditized when most houses in developed world pay more for electricity than that.

My definition of LLM-assisted coding is that you fully understand every change and every single line of the code. Otherwise it's vibe coding. And I believe if one is honest to this principle, it's very hard to deplete the quota of the $100 tier.

5 more replies

taytus1mo ago

I'd recommend Kimi k2.6 for your use. It is an excellent model at a fraction of the cost, and you can use Claude Code with it.

I did a 1:1 map of all my Claude Code skills, and it feels like I never left Opus.

Super happy with the results.

4 more replies

dboreham1mo ago

Same. Never hit a limit. Use it heavily for real work. Never even thought of firing off an LLM for hours of...something. Seems like a recipe for wasting my time figuring out what it did and why.

goalieca1mo ago

Similar with the copilot and not autopilot usage. I find its the best of them all. Mostly i just use it as an occasionnal search engine. I've never found LLMs to be efficient to actually do work. I do miss the day when tech docs were usable. Claude seems like a crutch for gaps in developer experience more than anything.

cyanydeez1mo ago

Honestly, it sounds like, assuming you have no ethical qualms, you could get by with a Mac or AMD 395+ and the newest models, specifically QWEN3.5-Coder-Next. It does exactly as you describe. It maxes out around 85k context, which if you do a good job providing guard rails, etc, is the length of a small-medium project.

It does seem like the sweet spot between WallE and the destroyed earth in WallE.

2 more replies

drunken_thor1mo ago· 7 in thread

AI services are only minorly incentivized to reduce token usage. They want high token usage, it makes you pay more. They are going to continually test where the limit is, what is the max token usage before you get angry. All AI companies will continue to trade places for token use and cost as cost increases. We are in tepid water pretending it is a bath pretending we aren’t about to be boiled frogs.

jedberg1mo ago

People said this about AWS too. "Why would they save you money??". It turns out that every time they reduce prices, they make more money, because more people use their services.

AI companies have the same incentive. Make it cheaper and people will use it more, making you more money (assuming your price is still above cost). And of course they have every reason to reduce their on costs.

2 more replies

minimaxir1mo ago

To an extent. That economic incentive stops making sense when a) capacity is an actual constraint and b) Anthropic is not a monopoly and is subject to pressure from competitors who are more user-friendly.

estimator72921mo ago

I severely doubt it. Token spend translates to real cost for the provider. Each token involves real and expensive compute. They aren't free monopoly money you get billed arbitrarily for. You're paying for electricity and infrastructure involved in generating each token.

Less spend means less real cost to the provider while your flat monthly subscription stays the same price. As well, reducing token use per customer means you can over-subscribe even harder, allowing for more flat monthly subscriptions.

Less tokens = more free capacity = more subscription income.

GodelNumbering1mo ago

I am betting on the fact that people will get increasingly frustrated at closed agent lock-ins. I built (cline fork) and open-sourced https://github.com/dirac-run/dirac with the sole focus on token efficiency expecting that the closed-lock-in vendors will do enough to frustrate their users over time. Looking for contributors

y42OP1mo ago

That's what I am thinking, too. It sound's like a conspiracy theory, but at the end Anthropic et al benefits from models that don't finish their jobs. I recently read about this "over editing phenomenon". The machine is never done. It doesn't want to.

It's like dating apps. They don't want you to find a good match, because then you cancel the subscription.

1 more reply

nananana91mo ago

Up to a point. There is incentive when they get to the point where they literally can't serve their userbase and customers start leaving.

zzzeek1mo ago

Well that's why threads like this are important to upvote. On hacker news , they're angry !

wilbur_whateley1mo ago· 6 in thread

Claude with Sonnet medium effort just used 100% of my session limit, some extra dollars, thought for 53 minutes, and said:

API Error: Claude's response exceeded the 32000 output token maximum. To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.

amarcheschi1mo ago

And on the seventh day, API Error: Claude's response exceeded the 32000 output token maximum

1 more reply

couchdb_ouchdb1mo ago

I don't think i'd let it think more than 5 minutes without killing the process.

1 more reply

jasonlotito1mo ago

Just curious, what version of Max are you on: 5x or 20x?

2ndorderthought1mo ago

I hope this doesn't come out wrong but. When this happens do agentic/vibe coders message their boss and say "sorry can't work until tomorrow?"

2 more replies

jansenmac1mo ago

Just copy and past the error back to Claude and you will be able to continue. I have seen this many times over the past few months. I thought it was related to AWS bedrock that I have been using - but probably not.

giancarlostoro1mo ago

You're using it within their high usage rate window. I hope you're aware of this, if you use it out of the high usage time window it's supposed to use less, but it does seem a little odd that Sonnet uses so much, even on Medium.

1 more reply

ChicagoDave1mo ago· 4 in thread

I think there’s a clear split amongst GenAI developers.

One group is consistently trying to play whack-a-mole with different models/tools and prompt engineering and has shown a sine-wave of success.

The other group, seemingly made up of architects and Domain-Driven Design adherents has had a straight-line of high productivity and generating clean code, regardless of model and tooling.

I have consistently advised all GenAI developers to align with that second group, but it’s clear many developers insist on the whack-a-mole mentality.

I have even wrapped my advice in https://devarch.ai/ which has codified how I extract a high level of quality code and an ability to manage a complex application.

Anthropic has done some goofy things recently, but they cleaned it up because we all reported issues immediately. I think it’s in their best interests to keep developers happy.

My two cents.

joquarky1mo ago

I kind of wonder if people with ADHD tend to fall into the latter group, as we are used to setting guardrails to keep us aligned to a goal.

1 more reply

camel_Snake1mo ago

FYI that prominent link to your sharpee repo on GitHub 404s

1 more reply

estimator72921mo ago

IME it seems that output quality is directly proportional to the amount of engineering effort you put in. If a bug happens and you just tell the model to fix it over and over with no critical thinking, you end up with an 800 line shell script meant to change the IP address on an interface (real example). If you stop and engage your brain to reason about bugs and explain the problem, the model can fix it in an acceptable manner.

If you want to get good results, you still have to be an engineer about it. The model multiplies the effort you put in. If your effort and input is near zero, you get near zero quality out. If you do the real work and relegate the model to coloring inside the lines, you get excellent results.

1 more reply

rglover1mo ago

Dead on. Any company not thinking about this like the 2nd group is setting themselves up for a bad time (and sadly, anecdotally, that seems to be an emerging majority).

1 more reply

anonyfox1mo ago· 3 in thread

My max20 sub is sitting unused since april mostly now, codex with 5.4 (and now 5.5) even with fast mode (= double token costs) is night and day. Opus is doing convincing failures and either forgets half the important details or decides to do "pragmatic" (read: technical debt bandaids or worse) silently and claims success even with everything crashing and burning after the changes. and point out the errors it will make even more messes. Opus works really well for oneshotting greenfield scopes, but iterating on it later or doing complex integrations its just unusable and even harmfully bad.

GPT 5.4+ takes its time and considers even edgecases unprovoked that in fact are correct and saves me subsequent error hunting turns and finally delivers. Plus no "this doesn't look like malware" or "actually wait" thinking loops for minutes over a oneliner script change.

fluidcruft1mo ago

My mental model for LLM is I don't expect them to chew gum and walk at the same time. Cleaning code up is a different task from building new functionality.

GLM always feels like it's doing things smarter, until you actually review the code. So you still need the build/prune cycle. That's my experience anyway.

jorjon1mo ago

Can I get that max20 if you are not using it?

cmrdporcupine1mo ago

Most "productive" flow I found was when I had both memberships and let Claude do the "I go yeet your feature" side and Codex do the "WTF bro, that's full of race conditions!" review phase.

But now I just use Codex. Claude is unreliable and leaves data races all over and leaves, as you say, negative conditions unhandled fairly consistently.

bryan01mo ago· 3 in thread

I see a lot of people struggling to work with agents. This post has a good example:

> “you can’t be serious — is this how you fix things? just WORKAROUNDS????”

If this is how you’re interacting with your agents I think you’re in for a world of disappointment. An important part of working with agents is providing specific feedback. And beyond that making sure this feedback actually available to them in their context when relevant.

I will ask them why they made a decision and review alternatives with them. These learnings will aid both you and the agent in the future.

aulin1mo ago

After you see it skip reasoning so many times and saying "actually the simplest fix is" the laziest thing ever you get kind of tired of babysitting it.

causal1mo ago

Even their explanations are often confabulations. Best case they point to something wrong in your prompt or agents files, but usually it’s just noise.

philipwhiuk1mo ago

Like babysitting an intern.

petterroea1mo ago· 3 in thread

Looking at Anthropic's new products I think they understand they don't really have a cutting edge other than the brand.

I tried Kimi 2.6 and it's almost comparable to Opus. Anthropic lost the ball. I hope this is a sign the we are moving towards a future where model usage is a commodity with heavy competition on price/performance

mmonaghan1mo ago

Kimi nowhere close to opus on extended use but definitely highly competitive with sonnet. I will probably end up using kimi for personal stuff when I find some time to get it running or get a non-anthropic/openai harness set up on my personal machine.

jetbalsa1mo ago

I've been mostly using Kimi has a hacker of sorts, putting it places I want to attach AI directly as their API for their plans are not completely user hostile. Need to do OCR for scanning Magic the Gathering Cards. Sure!, have it attached to X4: Foundations as a AI manager for some stuff. sounds fun. Can't really do that with claude

alex-onecard1mo ago

How are you using kimi 2.6? I am considering their coding plan to replace my claude max 5x but I am worried about privacy and security.

2 more replies

giancarlostoro1mo ago· 3 in thread

I'm torn because I use it in my spare time, so I've missed some of these issues, I don't use it 9 to 5, but I've built some amazing things, when 1 Million tokens dropped, that was peak Claude Code for me, it was also when I suspect their issues started. I've built up some things I've been drafting in my head for ages but never had time for, and I can review the code and refine it until it looks good.

I'm debating trying out Codex, from some people I hear its "uncapped" from others I hear they reached limits in short spans of time.

There's also the really obnoxious "trust me bro" documentation update from OpenClaw where they claim Anthropic is allowing OpenClaw usage again, but no official statement?

Dear Anthropic:

I would love to build a custom harness that just uses my Claude Code subscription, I promise I wont leave it running 24/7, 365, can you please tell me how I can do this? I don't want to see some obscure tweet, make official blog posts or documentation pages to reflect policies.

Can I get whitelisted for "sane use" of my Claude Code subscription? I would love this. I am not dropping $2400 in credits for something I do for fun in my free time.

fluidcruft1mo ago

It sounds like we have very similar usage/projects. codex had been essentially uncapped (via combination of different x-factors between Plus and Pro and promotions) until very recently when they copied Anthropic's notes.

Plus is still very usable for me though. I have not tried Claude Pro in quite a while and if people are complaining about usage limits I know it's going to be a bad time for me. I had to move up from Claude Pro when the weekly limits were introduced because it was too annoying to schedule my life around 5hr windows.

I started using codex around December when I started to worry I was becoming too dependent on Claude and need to encourage competition. codex wasn't particularly competitive with Claude until 5.4 but has grown on me.

The only thing I really care about is that whatever I'm using "just works" and doesn't hurt limits and Claude code has been flaky as all hell on multiple fronts ever since everyone discovered it during the Pentagon flap. So I tend to reach for ChatGPT and codex at the moment because it will "just work" and there's a good chance Claude will not.

dheera1mo ago

Claude Code now has an official telegram plugin and cron jobs and can do 80% of the things people used OpenClaw for if you just give it access to tools and run it with --dangerously-skip-permissions.

2 more replies

scottyah1mo ago

Don't forget, Openclaw was basically bought by OpenAI so there's only incentive to use it as a wedge to pry people off Anthropic.

wood_spirit1mo ago· 2 in thread

Me and so many coworkers have been struggling with a big cognitive decline in Claude over the last two months. 4.5 was useful and 4.6 was great. I had my own little benchmark and 4.5 could just about keep track of a two way pointer merge loop whereas 4.6 managed a 3 way and the 1M context managed k-way. And this ability to track braids directly helped it understand real production code and make changes and be useful etc.

but then two months ago 4.6 started getting forgetful and making very dumb decisions and so on. Everyone started comparing notes and realising it wasn’t “just them”. And 4.7 isn’t much better and the last few weeks we keep having to battle the auto level of effort downgrade and so on. So much friction as you think “that was dumb” and have to go check the settings again and see there has been some silent downgrade.

We all miss the early days of 4.6, which just show you can have a good useful model. LLMs can be really powerful but in delivering it to the mass market Anthropic throttle and downgrade it to not useful.

My thinking is that soon deepseek reaches the more-than-good-enough 4.6+ level and everyone can get off the Claude pay-more-for-less trajectory. We don’t need much more than we’ve already had a glimpse of and now know is possible. We just need it in our control and provisioned not metered so we can depend upon it.

hungryhobbit1mo ago

This was a real issue, and Anthropic recently awknowledged it:

https://www.anthropic.com/engineering/april-23-postmortem

Of course, it sucks when companies screw up ... but at the same time, they "paid everyone back" by removing limits for awhile, and (more importantly to me) they were transparent about the whole thing.

I have a hard time seeing any other major AI provider being this transparent, so while I'm annoyed at Claude ... I respect how they handled it.

2 more replies

isoprophlex1mo ago

did you set your 4.7 to xhigh or max effort? anything else is basically not worth your time...

1 more reply

pram1mo ago· 2 in thread

I’ve noticed most of the complaints are about the Pro plan. Anecdotally I pay for the $200 Max plan and haven’t noticed anything radically different re: tokens or thinking time (availability is still a crapshoot)

I am certainly not saying people should “spend more money,” more like the Claude Code access in the Pro plan seems kind of like false advertising. Since it’s technically usable, but not really.

swiftcoder1mo ago

> I am certainly not saying people should “spend more money,” more like the Claude Code access in the Pro plan seems kind of like false advertising

Its particularly noticeable when for a long time you could work an 8 hour day in codex on ChatGPT´s $20/month plan (though they too started tightening the screws a couple of weeks back)

thebitguru1mo ago

My guess is that the higher plans will be next, especially as more people upgrade to those and maximize their usage.

mrinterweb1mo ago· 2 in thread

My recent frustration with Claude has been it feels like I'm waiting on responses more. I don't have historical latency to compare this with, but I feel like it has been getting slower. I may be wrong, and maybe its just spending more time thinking than it used to. My guess is Anthropic is having capacity issues. I hope I'm wrong because I don't want to switch.

janalsncm1mo ago

There was a really good point in this podcast episode about the speed of LLMs. They are so slow that all of the progress messages and token streaming are necessary. But the core problem is that the technology is so darn slow.

https://podcasts.apple.com/us/podcast/this-episode-is-a-cogn...

As someone who both uses and builds this technology I think this is a core UX issue we’re going to be improving for a while. At times it really feels like a choose 2+ of: slow, bad, and expensive.

hu31mo ago

About slowdowns... I have this theory that if they sneak some sleep(1) calls while processing medium to complex prompts they can serve more clients.

But I think "context switching" between 2 different prompts might be too expensive for GPUs to be worth it for LLM providers. Who knows.

lawrence11mo ago· 2 in thread

The timeline doesn't make any sense. How can you subscribe a couple weeks ago and the problem start 3 weeks ago and yet things also went well for the first few weeks. was this written by GPT 5.5?

wg01mo ago

The author is not a native English speaker it seems.

They might mean "few weeks ago" and the phrase "couple of weeks ago" might not be exactly as "Vor ein paar Wochen" in their mind rather could be as "few weeks ago."

Rest of the prose in the article seems to support the assumption.

The post is handwritten with no LLMs involved.

fortynights1mo ago

Seriously, I’ve been subscribed to Claude for 18 months now ($20/mo) during which time I’ve seen the hype around other models come and go in a matter of weekends. I’ve just come to accept that everyone is largely commodity and sometimes it’s worth taking a longer view (provided one can afford it).

vondur1mo ago· 2 in thread

Wait, weren't there posts in the not too distant past where everyone was signing the praises for Claude and wondering how OpenAI will catch up?

swader9991mo ago

Yep. I think the sentiment here isn't lagging too much in terms of the day to day experience of what is being offered. Kind of makes HN very useful in this regard.

cyanydeez1mo ago

Wait, are SaaS's fundamentally shifting business models searching to maximize the value of a product at the expense of a customer over time?

Strange how things can change!

1 more reply

zendarr1mo ago· 2 in thread

Seems like some of the token issues may be corrected now

https://www.anthropic.com/engineering/april-23-postmortem

minimaxir1mo ago

These changes fixed some of the token issues, but the token bloat is an intrinsic problem to the model, and Anthropic's solution of defaulting to xhigh reasoning for Opus 4.7 just means you'll go through tokens faster anyways.

1 more reply

giancarlostoro1mo ago

The problem is they changed people's default settings, and if you're like me, you keep a Claude Code session open for days, maybe weeks and even a month, and just come back to it and keep going. I wouldn't be surprised if there's hundreds if not thousands of people still on these broken configurations / models.

Dear Anthropic:

Please, for the love of all things holy, NEVER change someone's defaults without INFORMING the end user first, because you will wind up with people confused, upset, and leaving your service.

nikolay1mo ago· 2 in thread

I can agree. ChatGPT 5.5 made this a no-brainer choice. Anthropic are idiots removing Claude Code from the Pro plan. They need to ask Claude if what they did was a natural intelligence bug! Greed kills companies, too!

Capricorn24811mo ago

> I can agree. ChatGPT 5.5 made this a no-brainer choice

The new model that came out less than 24 hours ago made this obvious? This feels like when a new video game comes out and there's 1,000 steam reviews glazing it in the first hours of release. Don't you think you should use it for longer than a day before declaring it a game changer?

2 more replies

robotnikman1mo ago

>removing Claude Code from the Pro plan

Wait really? I wanted to give it a try, but for $200 a month no way am I paying that for something I just want to experiment around with

2 more replies

caycep1mo ago· 2 in thread

If all Claude does is automate mundane code, why not just make a "meta library" of said common mundane code snippets?

twobitshifter1mo ago

maybe make it so that when you start typing it completes the snippet?

queuebert1mo ago

Like Stack Overflow?

1 more reply

areoform1mo ago· 1 in thread

I've noticed that sometimes the same Claude model will make logical errors sometimes but not other times. Claude's performance is highly temporal. There's even a graph! https://marginlab.ai/trackers/claude-code/

I haven't seen anyone mention this publicly, but I've noticed that the same model will give wildly different results depending on the quantization. 4-bit is not the same as 8-bit and so on in compute requirements and output quality. https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

I'm aware that frontier models don't work in the same way, but I've often wondered if there's a fidelity dial somewhere that's being used to change the amount of memory / resources each model takes during peak hours v. off hours. Does anyone know if that's the case?

8organicbits1mo ago

I'm not sure that graph shows a time-based correlation. The 60% line stays inside the 95% confidence interval. Is that not just a measurement of noise?

siliconc0w1mo ago· 1 in thread

Shameless self plug but also worried about the silent quality regressions, I started building a tool to track coding agent performance over time.. https://github.com/s1liconcow/repogauge

Here is a sample report that tries out the cheaper models + the newest Kimi2.6 model against the 5.4 'gold' testcases from the repo: https://repogauge.org/sample_report.

conception1mo ago

This is cool - just wanted to note https://marginlab.ai is one that has been around for a while.

1 more reply

DeathArrow1mo ago· 1 in thread

I use Claude Code with GLM, Kimi and MiniMax models. :)

I was worried about Anthropic models quality varying and about Anthropic jacking up prices.

I don't think Claude Code is the best agent orchestrator and harness in existence but it's most widely supported by plugins and skills.

droidjj1mo ago

Where are you getting inference from? I'm overwhelmed by the options at the moment.

2 more replies

varispeed1mo ago· 1 in thread

It also seems to me they route prompts to cheaper dumber models that present themselves as e.g. Opus 4.7. Perhaps that's what is "adaptive reasoning" aka we'll route your request to something like Qwen saying it's Opus. Sometimes I get a good model, so I found I'll ask a difficult question first and if answer is dumb, I terminate the session and start again and only then go with the real prompt. But there is no guarantee model will be downgraded mid session. I wish they just charged real price and stopped these shenanigans. It wastes so much time.

dswalter1mo ago

You're describing a Taravangian prompt situation (a character in a book series who wakes up with a different/random intelligence level each day and has a series of tests for himself to determine which kind of decisions he's capable of that day). https://coppermind.net/wiki/Taravangian

algoth11mo ago· 1 in thread

Doesn't "poor support" implies that there is some sort of support? Shouldnt it be "no support"

ipaddr1mo ago

You get to talk to an AI agent

torstenvl1mo ago· 1 in thread

I feel like almost everyone using AI for support systems is utterly failing at the same incredibly obvious place.

The first job of any support system—both in terms of importance and chronologically—is triage. This is not a research issue and it's not an interaction issue. It's at root a classification problem and should be trained and implemented as such.

There are three broad categories of interaction: cranks, grandmas, and wtfs.

Cranks are the people opening a support chat to tell you they have vital missing information about the Kennedy Assassintion or they want your help suing the government for their exposure to Agent Orange when they were stationed at Minot. "Unfortunately I can't help with that. We are a website that sells wholesale frozen lemonade. Good luck!"

Grandma questions are the people who can't navigate your website. (This isn't meant to be derogatory, just vivid; I have grandma questions often enough myself.) They need to be pointed toward some resource: a help page, a kb article, a settings page, whatever. These are good tasks for a human or LLM agent with a script or guideline and excellent knowledge/training on the support knowledge base.

WTFs are everything else. Every weird undocumented behavior, every emergent circumstance, every invalid state, etc. These are your best customers and they should be escalated to a real human, preferably a smart one, as soon as realistically possible. They're your best customers because (a) they are investing time into fixing something that actually went wrong; (b) they will walk you through it in greater detail than a bug report, live, and help you figure it out; and (c) they are invested, which means you have an opportunity for real loyalty and word-of-mouth gains.

What most AI systems (whether LLMs or scripts) do wrong is that they treat WTFs like they're grandmas. They're spending significant money on building these systems just to destroy the value they get from the most intelligent and passionate people in their customer base doing in-depth production QC/QA.

dboreham1mo ago

This rings true. However I have used one AI automated support chat that didn't behave that way. I wish I could remember the vendor but I do remember being blown away when it said something like "that sounds like a real problem would you like me to open a support ticket for this?". Which it then did and subsequently a human addressed my issue.

joozio1mo ago· 1 in thread

Funny. I thought I was the only one. Then I found more people and now you wrote about that. Just this week I also wrote about Claude Opus 4.7 and how I came back to Codex after that: https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026

y42OP1mo ago

I like your blog and I can totally relate to this article - it's like something I wanted to write about for a couple of weeks now. :D

https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience...

easythrees1mo ago· 1 in thread

I have to say, this has been the opposite of my experience. If anything, I have moved over more work from ChatGPT to Claude.

kleene_op1mo ago

Same. I am getting crazy good value from Claude at work, on both scientific applications and deployment environments.

There is one caveat, and that is you have to give the model well thought out constraints to guide it properly, and absolutely take the time to read all the thinking it's doing and not be afraid to stop the process whenever things go sideway.

People who just let Claude roam free on their repository deserve everything they end up with.

dostick1mo ago· 1 in thread

The discussion about Claude always omit the important context - which language/platform you’re using it for. It is best trained for web languages and has most up to date knowledge for that. If you use it for Swift it is trained on whole landfill of code and that gives you strong bias towards pre-Swift 6 coding output. Imagine you would give Claude a requirements for a web app, and it implements it all in JQuery. That’s what happens with other platforms.

adamors1mo ago

It’s not ommited, OP clearly talks about editing Javascript.

0xchamin1mo ago· 1 in thread

One of the biggest problem with Claude is, it tries to do things that I don't even ask. I really like to have full control over what I do. I feel sometimes, Claude has the urgency to keep going with what it is hardcore programmed for instead waiting for my feedback. Looks like, Claude consider everything to be oneshot. I maybe wrong, this is my personal experience

olcay_1mo ago

Claude Code has something about picking sensible choices instead of asking questions in the system prompt, that's probably the problem.

lawrence11mo ago· 1 in thread

The timeline of the first few sentences doesn't add up. how can you subscribe 2 weeks ago when the problem started 3 weeks ago.

y42OP1mo ago

https://news.ycombinator.com/item?id=47894155

(I am just learning that "a couple of weeks" apparently means "2 weeks"...)

chaosprint1mo ago· 1 in thread

I bought a Claude membership a few days ago. I asked him to fix a React issue—a very simple UI modification with almost no logic. He still failed to understand it. And after three attempts, the 5-hour limit was reached. This was a disaster. I had to immediately buy a CodeX membership and also tried Image2. I won't give Claude another chance.

jryio1mo ago

I find it strange that you've anthropomorphized Claude but not ChatGPT seemingly based on one having a human name and the other not

aleqs1mo ago· 1 in thread

The usage metering is just so incredibly inconsistent, sometimes 4 parallel Opus sessions for 3 hours straight on max effort only uses up 70% of a session, other times 20 mins / 3 prompts in one session completely maxes it out. (Max x20 plan) Is this just a bug on anthropic side or is the usage metering just completely opaque and arbitrary?

unshavedyak1mo ago

It's something strange because i never have these issues. I often run two in parallel (though not all day), and generally have something running anytime i look at my laptop to advance the steps/tasks/etc. Usually i struggle to hit 50% on my Max20.

Heck two weeks ago i tried my hardest to hit my limit just to make use of my subscription (i sometimes feel like i'm wasting it), and i still only managed to get to 80% for the week.

I generally prune my context frequently though, each new plan is a prune for example, because i don't trust large context windows and degradation. My CLAUDE.md's are also somewhat trim for this same fear and i don't use any plugins, and only a couple MCPs (LSP).

No idea why everyone seems to be having such wildly different experiences on token usage.

1 more reply

binyu1mo ago· 1 in thread

I feel like Anthropic is forcing their new model (Opus 4.7) to do much less guess work when making architectural choices, instead it prefers to defer back decisions to the user. This is likely done to mine sessions for Reinforcement-Learning signals which is then used to make their future models even smarter.

hungryhobbit1mo ago

https://www.anthropic.com/engineering/april-23-postmortem

On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode. This was the wrong tradeoff. We reverted this change on April 7 after users told us they'd prefer to default to higher intelligence and opt into lower effort for simple tasks. This impacted Sonnet 4.6 and Opus 4.6.

On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6.

On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality and was reverted on April 20. This impacted Sonnet 4.6, Opus 4.6, and Opus 4.7.

1 more reply

yalogin1mo ago· 1 in thread

If someone wants to move off Claude what are the alternatives? More importantly can another system pick up from where Claude left off or is there some internal knowledge Claude keeps in their configuration that I need to extract before canceling?

janalsncm1mo ago

Opencode is a great cli for driving a coding agents.

Like 3 weeks ago Qwen3-coder was the best coding LLM to run locally. I haven’t spent time since to figure out if anything is better.

You can also power Opencode with OpenRouter which lets you pay for any LLM à la carte.

1 more reply

nickdothutton1mo ago· 1 in thread

Switched to local models after quality dropped off a cliff and token consumption seemed to double. Having some success with Qwen+Crush and have been more productive.

tfrancisl1mo ago

Would love some more info on how you got any local model working with Crush. Love charmbracelet but the docs are all over the place on linking into arbitrary APIs.

1 more reply

kashunstva1mo ago

I’m sympathetic to the author’s complaints about Anthropic’s support, though I would go further. It doesn’t exist.

For reasons that continue to elude me, almost exactly one year ago, Anthropic cancelled my Claude Pro plan. To appeal, you must fill out a Google docs form. And wait. In my case, I’ve waited for about one year. Once I managed to email with a human but they quickly plugged that hole with a chatbot that sends you back to their never-to-be-reviewed form. No route to escalate.

A year gives one a long time to think about things. Maybe it was because I was on a VPN temporarily. Otherwise, no clue. I’m a hobbyist embedded developer. That’s it.

So no, Anthropic support isn’t just poor; it’s nonexistent.

lukaslalinsky1mo ago

I feel like Opus 4.5 was the peak in Claude Code usefulness. It was smart, it was interactive, it was precise. In 4.6 and 4.7, it spends a long time thinking and I don't know what's happening, often hits a dead-end and just continues. For a while I was setting Opus 4.5 in Claude Code, but it got reset often. I just canceled my Max plan, don't know where to look for alternatives.

cbg01mo ago

I've been a fan since the launch of the first Sonnet model and big props for standing up to the government, but you can sure lose that good faith fast when you piss off your paying customers with bad communication, shaky model quality and lowered usage limits.

stldev1mo ago

Same, after being a long-time proponent too.

First was the CC adaptive thinking change, then 4.7. Even with `/effort max` and keeping under 20% of 1M context, the quality degradation is obvious.

I don't understand their strategy here.

1 more reply

binaryturtle1mo ago

I have a simple rule: I won't pay for that stuff. First they steal all my work to feed into those models, afterwards I shall pay for it? No way!

I use AI, but only what is free-of-charge, and if that doesn't cut it, I just do it like in the good old times, by using my own brain.

1 more reply

bauerd1mo ago

They can't afford to care about individual customers because enterprise demand exploded and they're short on compute

stan_kirdey1mo ago

I also cancelled my subscription.The $20 Pro plan has become completely unusable for any real work. What is especially frustrating is that Claude Chat and Claude Code now share the exact same usage limits — it makes zero sense from a product standpoint when the workflows are so different. Even the $200 Max plan got heavily nerfed. What used to easily last me a full week (or more) of solid daily use now burns out in just a few days. Combined with the quality drop and unpredictable token consumption, it simply stopped being worth it.

vintagedave1mo ago

They won't even reset usage for me: https://news.ycombinator.com/item?id=47892445

And by crikey do I empathise with the poor support in this article. Nothing has soured me on Anthropic more than their attitude.

Great AI engineers. Questionable command line engineers (but highly successful.) Downright awful to their customers.

lanthissa1mo ago

for all the drama, its pretty clear both openai, google, and anthropic have had to degrade some of their products because of a lack of supply.

There's really no immediate solution to this other than letting the price float or limiting users as capacity is built out this gets better.

isjcjwjdkwjxk1mo ago

Oh no, the unreliable product people pretend is the next coming of Jesus turned out to be thoroughly unreliable. Who coulda thunk it.

PeterStuer1mo ago

I'm on max x5. No limit problems, but I am definetly feeling the decline. Early stopping and being hellbent on taking shortcuts being the main culprits, closely followed by over optimistic (stale) caching (audit your hooks!).

All mostly mitigatable by rigorous audits and steering, but man, it should not have to be.

taffydavid1mo ago

I know this thread is likely full of similar anecdotes, but I also want to share.

My experience very suddenly and very clearly degraded over the last few days.

Today I was trying to build a simple chess game. Previous one shots were HTML, this gave me a jsx file. I asked it to put it HTML and it absolutely devoured my credits doing so, I had to abort and do it manually. The resulting app didn't work, and it had decided that multiplayer could work by storing the game state only on local storage without the clients communicating at all

aucisson_masque1mo ago

First ever time I used ai to code was a week ago, went with the Claude pro because I didn't want to commit.

The 20$ plan has incredible value but also, the limit are just way too tight.

I'm glad Claude made me discover the strength of ai, but now it's time to poke around and see what is more customer friendly. I found deepseek V4 to be extremely cheap and also just as good.

Plus I get the benefit to use it in vs code instead of using Claude proprietary app.

I think that when people goes over the hype and social pressure, anthropic will lose quite a lot of customer.

datavirtue1mo ago

I have enterprise plans for all AI services except Google. GitHub Copilot in VS Code is the best I have used so far. I hear a lot of complaints from people who are holding it wrong. In a single day I can have a beautiful greenfield app deployed. One dev. One day. Something that would have taken weeks with two teams bumping into each other. It's fully documented. Beautiful code. I read the reasoning prompts as it flows by to get an idea of what is going on. I work in phases and review the code and working product quickly after that. Minimal issues.

I'm an executive, the devs complaining are getting retrained or put on the chopping block.

My rockstars are now random contractor devs from Vietnam. The aloof FTE grey beards saying "I don't know, it doesn't work very good on X." Are getting a talking to or being sidelined/canned. So far most of my grey beards are adapting pretty well.

I'm not waiting on people to write code any more. No way in hell.

tamimio1mo ago

Very similar experience, although I didn’t use claude for anything in production, but I did try some tests with some few topics and questions on things that I know, and while initially it works very well, but as soon as you dive deeper you get all sort of extra none sense that was never asked to add/do nor it’s useful, just workarounds after workarounds after duct tape solutions, several times I would say “no, why are you introducing xyz, that will cause this and that” to get similar answer of “thanks for pushing back, you are right bla bla”.

We probably hit peak generative AI last year, now they probably use AI to improve the AI so it’s kinda garbage in garbage out, or maybe anthropic is deprioritizing users while favoring enterprise or even government where it provides better quality for higher contracts.

vivin1mo ago

This is interesting to me, because Claude has been a net-positive for me. I haven't faced token issues or declining quality. I generally work with Claude as an assistant -- I may have it do planning and have it "one shot" a thing, but it's usually a personal tool or a utility that I want it to write.

For actual code that goes out to production, I generally figure out how I would solve the problem myself (but will use Claude to bounce ideas and approaches -- or as a search engine) and then have Claude do the boring bits.

Recently I had to migrate a rules-engine into an FSM-based engine. I already had my plan and approach. I had Claude do the boring bits while I implemented the engine myself. I find that Claude does best when you give it small, focused, incremental tasks.

zulban1mo ago

Curious. Not my experience whatsoever.

I tried Claude recently and it was able to one-shot fixes on 9/9 of the bugs I gave it on my large and older Unity C# project. Only 2/9 needed minor tweaks for personal style (functionally the same).

Maybe it helps that I separately have a CLI with very extensive unit tests. Or that I just signed up. Or that I use Claude late in the evenings (off hours). I also give it very targeted instructions and if it's taking longer than a couple minutes - I abort and try a different or more precise prompt. Maybe the backend recognizes that I use it sparingly and I get better service.

The author describes what sounds like very large tasks that I'd never hand off to an AI to run wild in 2026.

Anyway I thought I'd give a different perspective than this thread.

arikrahman1mo ago

I use Aider nowadays, and will probably cancel my Github multi AI bundle subscription due to the new training policy. I find using Aider with the new open models and using Open Spec to negotiate requirements before the handoff, has helped me a lot.

burnJS1mo ago

My experience is Claude and others are good at writing methods and smaller because you can dictate what it should do in less tokens and easily read the code. This closes the feadback loop for me.

I occasionally ask AI to write lots of code such as a whole feature (>= medium shirt size) or sometimes even bigger components of said feature and I often just revert what it generated. It's not good for all the reasons mentioned.

Other times I accept its output as a rough draft and then tell it how to refactor its code from mid to senior level.

I'm sure it will get better but this is my trust level with it. It saves me time within these confines.

Edit: it is a valuable code reviewer for me, especially as a solo stealth startup.

throwaway20271mo ago

Same. I think one of the issues is that Claude reached a treshold where I could just rely on it being good and having to manually fix it up less and less and other models hadn't reached that point yet so I was aware of that and knew I had to fix things up or do a second pass or more. Other providers also move you to a worse model after you run out which is key in setting expectation as well. Developers knew that that was the trade-off.

I think even with the worse limits people still hated it but when you start to either on purpose or inadvertently make the model dumber that's when there's really no purpose to keep using Claude anymore.

duxup1mo ago

I’ve definitely encountered a drop in Claude quality.

Even a simple prompt focused on two files I told Claude to do a thing to file A and not change file B (we were using it as a reference).

Claude’s plan was to not touch file B.

First thing it did was alter file B. Astonishing simple task and total failure.

It was all of one prompt, simple task, it failed outright.

I also had it declare that some function did not have a default value and then explain what the fun does and how it defaults to a specific value….

Fundamentally absurd failures that have seriously impacted my level of trust with Claude.

brunooliv1mo ago

I still haven’t seen any other models be as complete as Claude inside Claude Code. I bet Anthropic knows this and they turn the knobs and see people’s reactions… I have been planning with Qwen3.6 Max inside opencode, absolutely game changer. Opus can then follow the plan quite detailed and like this I can make progress on my toy apps on Pro plan at 20/mo.

For work, unlimited usage via Bedrock.

Yes I’d like to get more usage out of my personal sub, but at 20/mo no complains

airbreather1mo ago

I am sort of in the same place, it seems to have lost enough of the magic that I might be better trying to do more with running local LLMs on my 4090.

The thing is running local LLMs will give some kind of reliability and fixed expectations that saves a lot of time - yeah sure Claude might be fantastic one day, but what do I do when the same workload churns out shit the next and I am halfway thru updating and referencing a 500 document set?

Better the devil you know and all that.

wslh1mo ago

Anthropic is astroscaling. We're essentially buying into a loop where speed and iteration take precedence over stability and support. If you view them as an experimental lab undergoing rapid atmospheric friction rather than a company, the "unreliability" is just the cost of being at the frontier. This is not an endorsement for Anthropic, just imagining their craziness on how you "can" grow in a fraction of time.

shevy-java1mo ago

Those AI using software developers begin to show signs of addiction:

From "yay, claude is awesome" to "damn, it sucks". This is like with withdrawal symptoms now.

My approach is much easier: I'll stay the oldschool way, avoid AI and come up with other solutions. I am definitely slower, but I reason that the quality FOR other humans will be better.

rurban1mo ago

That's bad for him, because he already had a cheap plan. Now he wont get it back that easy.

Pro is gone. OpenAI plans are more expensive. He can only buy a Kimi plan, which is at least better than Sonnet. But frontier for cheap is gone. Even copilot business plans are getting very expensive soon, also switching to API usage only.

gverrilla1mo ago

My main problem with claude code right now is observability. I've been experimenting a lot with vibe coding, but nowadays I can't even tell what it's doing. It's still delivering me value, but the trust on the company is going down and I've already started looking for alternatives.

estimator72921mo ago

I just noticed today that it doesn't warn about approaching limits and just blows straight into billing extra tokens.

I'm pretty sure it used to warn when you got close to your 5hr limit, but no, it happily billed extra usage. Granted only about $10 today, but over the span of like 45 minutes. Not super pleased.

erikbye1mo ago

I'm building a C++ game engine from scratch with CC. Renderer is Vulkan, no wrappers. Around 30k LoC now, quite a ways to go yet, but CC has been doing fine overall. Of course "he" runs in circles sometimes, but I always manage to nudge him back on track.

kx_x1mo ago

After the fixes in Claude Code, Opus 4.6/4.7 have been performing well.

Before the fixes, they were complete trash and I was ready to cancel this month.

Now, I'm feeling like the AI wars are back -- GPT 5.5 and Opus 4.7 are both really good. I'm no longer feeling like we're using nerfed models (knock on wood)!

ForOldHack1mo ago

I have token issues three times a day, and I just upgraded to pro... and now this... now I cancel. my work flow was co-pilot to Gemini to Claude Code... and the bottle neck was always CC. Always. I am done. It should be pretty easy to replace CC.

AI used to be, the punched card replicator... its all replaceable.

exabrial1mo ago

It's bad, really bad.

The filesystem tool cannot edit xml files with <name></name> elements in it

hybrid_study1mo ago

Sometimes it feels like Anthropic uses token processing as a throttling tool, to their advantage.

elevaet1mo ago

I've been very happy using Codex in the VScode extension. Very high quality coding and generous token limits. I've been running Claude in the CLI over the last couple of months to compare and overall I prefer Codex, but would be happy with either.

jp00011mo ago

Max x20 user here. As long as Opus 4.6 is available and they fix Opus 4.7, I'll stay with Anthropic. Tho, I'd imagine in 5 years we'll have Opus 4.6 equivalent performance available in an at home consumer model.

_pdp_1mo ago

Signup for all major providers (pro plan) and round-robin between all of them. This is the only way to protect against not having access to all of these heavily subsidised subscriptions. See what happened to Copilot.

josefritzishere1mo ago

AI has a lot of future potential but at every level... it's still not very good. And certainly not good enough to validate the expense, let alone what the actual cost would be were it profitable.

hedgehog1mo ago

I used Opus via Copilot until December and then largely switched over to Claude Code. I'm not sure what the difference is but I haven't seen any of these issues in daily use.

mattas1mo ago

I've see a post like this every week for the last 2 years. Are these models actually getting worse? Or do folks start noticing the cracks as they use them more and more?

sreekanth8501mo ago

Biggest issue i see is, models are not getting efficient. This is no where going to get commoditised. There will be a limit at which you can burn money at subsidised cost.

brachkow1mo ago

As many others I had negative (not good as before) feeling about Claude Code lately

What I don't understand is these loud "voting with money" comments. What they are canceling is very subsidized plan to buy something that delivers a lot of value.

There are only two providers that can provide this level of models at very subsidized price - anthropic and openai. Both of them are bad in terms of reliability.

So I wonder what these people do after they "cancel" both of them? Do they see producing less result at same hourly rate as everyone else on the market as viable option?

sfmike1mo ago

i ran prompts used up a ton of usage, and got no return just showed error.

Asked support hey i got nothing back i tried prompting several times used a ton of usage and it gave no response. I'd just like usage back. What I payed for I never got.

Just bot response we don't do refunds no exceptions. Even in the case they don't serve you what your plan should give you.

Animats1mo ago

Support? You expected support? Live support?

Most of this is about the billing system, which is apparently broken.

kissgyorgy1mo ago

I cancelled in the minute my subscription stopped working in Pi. Not going back to the slopfest what Claude Code is.

chadleriv1mo ago

Off topic: I do feel like this model switching content feels very circa 2010 "I'm quitting Facebook"

smashah1mo ago

Did the same with Google Ai Ultra. They rug pulled the subscribers. They changed the deal, we cancel. Simple.

Fabulu1mo ago

It's a client issue. It's a mystery to me why they don't fix it.

SwellJoe1mo ago

I don't get it. I use Claude Code every day, what I would consider pretty heavy usage...at least as heavy as I can use it while actually paying attention to what it's producing and guiding it effectively into producing good software. I literally never run into usage limits on the $100 plan, even when the bugs related to caching, etc. were happening that led to inflated token usage.

WTF are y'all doing that chews tokens so fast? I mean, sure, I could spin up Gas Town and Beads and produce infinite busy work for the agents, but that won't make useful software, because the models don't want anything. They don't know what to build without pretty constant guidance. Left to their own devices, they do busy work. The folks who "set and forget" on AI development are producing a whole lot of code to do nothing that needed doing. And, a lot of those folks are proud of their useless million lines of code.

I'm not trying to burn as many tokens as a possible, I'm trying to build good software. If you're paying attention to what you're building, there's so many points where a human is in the loop that it's unusual to run up against token limits.

Anyway, I assume that at some point they have to make enough money to pay the bills. Everything has been subsidized by investors for quite some time, and while the cost per token is going down with efficiency gains in the models/harnesses and with newer compute hardware tuned for these workloads, I think we're all still enjoying subsidized compute at the moment. I don't think Anthropic is making much profit on their plans, especially with folks who somehow run right at the edge of their token limit 24/7. And, I would guess OpenAI is running an even lossier balance sheet (they've raised more money and their prices are lower).

I dunno. I hear a lot of complaining about Claude, but it's been pretty much fine for me throughout 4.5, 4.6 and 4.7. It got Good Enough at 4.5, and it's never been less than Good Enough since. And, when I've tried alternatives, they usually proved to be not quite Good Enough for some reason, sometimes non-technical reasons (I won't use OpenAI, anymore, because I don't trust OpenAI, and Gemini is just not as good at coding as Claude).

3 more replies

AJRF1mo ago

We are in the 'we need to IPO so screw our customers' phase of the cycle

captainregex1mo ago

anyone remember the whole “delete uber” thing from 2017ish? good times

r0fl1mo ago

I hope codex doesn’t decline the same way

I’m blown away by how good it is lately

bad_haircut721mo ago

Waiting 60s every time I send a msg really kills the ux of claude

dannypostma1mo ago

When I saw the German screenshot it all made sense to me.

moralestapia1mo ago

The midwit curve of LLMs has OpenAI on both ends.

whalesalad1mo ago

I've spent thousands of dollars on API tokens in the last few months. Out of my own pocket, as an indie contractor. I used the API specifically instead of Pro/Max/Plus/Silver/Gold/Platinum/Diamond to avoid all of the mess there regarding usage resets and potential hidden routing to worse models. It worked great for months, I got a ton of shit done, shipped a bunch of features. I really began to rely on the tech. I was not happy about the cost, but the value proposition was there.

Then within the last few months everything changed and went to shit. My trust was lost. Behavior became completely inconsistent.

During the height of Claude's mental retardation (now finally acknowledged by the creators) I had an incident where CC ran a query against an unpartitioned/massive BQ table that resulted in $5,000 in extra spend because it scanned a table which should have been daily partitioned 30 times. 27 TB per scan. I recall going over and over the setup and exhaustively refining confidence. After I realized this blunder, I referred to it in the same CC session, "jesus fucking christ, I flagged this issue earlier" -- it responded, "you did. you called out the string types and full table scans and I said "let's do it later." That was wrong. I should have prioritized it when you raised it". Now obviously this is MY fault. I fucked up here, because I am the operator, and the buck stops with me. But this incident really galvinized that the Claude I had come to vibe with so well over the last N months was entirely gone.

We all knew it was making making mistakes, becoming fully retarded. We all felt and flagged this. When Anthropic came out and said, "yeah ... you guys are using it wrong, its a skill issue" I knew this honeymoon was over. Then recently when they finally came out and ack'd more of the issues (while somehow still glossing over how bad they fucked up?) it was the final nail. I'm done spending $ on Anthropic ecosystem. I signed up for OpenAI pro $200/mo and will continue working on my own local inference in the meantime.

1 more reply

postepowanieadm1mo ago

Yeah, session limits are kinda show stoppers.

zh_code1mo ago

I just cancelled my Max20 plan yesterday.

spaceman_20201mo ago

4.7 is the breaking point for me

It's almost unusable

rvz1mo ago

The great de-skilling programme continues in Anthropic's casino. They completely want you dependent on gambling tokens on their slot machines with extortionate prices, fees and limits.

Anthropic can't even scale their own infrastructure operations, because it does not exist and they do not have the compute; even when they are losing tens of billions and can nerf models when they feel like it.

Once again, local models are the answer and Anthropic continues to get you addicted to their casino instead of running your own cheaper slot machine, which you save your money.

Every time you go to Anthropic's casino, the house always wins.

r00t-1mo ago

Same, it's a mess.

danjl1mo ago

This sounds just like all my neighbors complaining about their internet provider.

system21mo ago

Same here. The single prompt burnt all my tokens in 3 minutes for the day. What happened to Claude in the last 2 months? I was happy with what they were providing and was happy to pay whatever for it. Why did they mess with it? Why are they destroying the tool we all loved?

I hate enshittification and I hate seeing this happening to Claude Code right now.

1 more reply

gizmodo591mo ago

Codex is becoming such a good product. I have the 100$ pro lite. I have Claude still but 20$. I rarely use it. Let’s see if they give generous limits and more importantly a model that’s better than 5.5. The mythos fear mongering did not give me a good impression that they care about the average developer.

queuebert1mo ago

Maybe this is an unpopular opinion, but I think choosing which companies to support during this period of pre-alignment is one way to vote which direction this all goes. I'm happy to accept a slightly worse coding agent if it means I don't get exterminated someday.

drivebyhooting1mo ago

Imagine vibe coding your core consumer application and associated backend…

Oh wait, I don’t have to imagine. That’s what Anthropic does. A nice preview for what is in store for those who chose to turn off their brains and turn on their AI agents.

scuff3d1mo ago

Welcome to the future. Anthropic is currently speed running it but this is what all LLM tools are going to look like in the next few years, once they turn the enshitification corner.

docheinestages1mo ago

Me too.

johanneskanybal1mo ago

It's not magic but for me definitly claude is the way to go. Not expecting magic it's just another level of non-slop than the rest I've tried.

GrumpyGoblin1mo ago

Cool

fractalf1mo ago

Hehe me too. Yesterday. Enough is enough. Using KLM5.1 and soon deepseek

gexla1mo ago

We can't do it. We standardized. They got us.

semiinfinitely1mo ago

absolute garbage support was the reason why I canceled. who would have thought that an AI company has only bots as support agents

jwaldrip1mo ago

I would love to just say that if you are using claude code, you should no be on pro. I feel like all the people complaining are complaining that an agent cant handle the work of a developer for $20/m. Get on at least max 5, its a world of a difference.

6 more replies

j / k navigate · click thread line to collapse

582 comments

221 comments · 108 top-level

wg01mo ago· 24 in thread

I write detailed specs. Multifile with example code. In markdown.

Then hand over to Claude Sonnet.

So turns out that I'm not writing code but I'm reading lots of code.

The fact that I know first hand prior to Gen AI is that writing code is way easier. It is reading the code, understanding it and making a mental model that's way more labour intensive.

Therefore I need more time and effort with Gen AI than I needed before because I need to read a lot of code, understand it and ensure it adheres to what mental model I have.

gwerbin1mo ago

You don't have to buy into the stupid vibecoding hype to get productivity value out of the technology.

2 more replies

Aurornis1mo ago

Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.

That's vibecoding with an extra documentation step.

> Therefore I need more time and effort with Gen AI than I needed before

Stop trying to use it as all-or-nothing. You can still make the decisions, call the shots, write code where AI doesn't help and then use AI to speed up parts where it does help.

That's how most non-junior engineers settle into using AI.

Ignore all of the LinkedIn and social media hype about prompting apps into existence.

EDIT: Replaced a reference to Opus and GPT-5.5 with "best available model at the time" because it was drawing a lot of low-effort arguments

8 more replies

scuderiaseb1mo ago

2 more replies

hintymad1mo ago

> With hard requirements listed, I found out that the generated code missed requirements,

1 more reply

coldtea1mo ago

Still faster than writing each of those parts yourself (a few minutes instead of multiple hours), but much more accurate.

1 more reply

bmurphy19761mo ago

I'm starting to think a lot of the problem people are having is just that they have unrealistic expectations.

Maybe you're expecting too much and pushing it too hard/fast/prematurely?

I bet if you find that balance you will see value, but it might not be as fast as you want, just as fast as is viable which is likely still going to be faster than you doing it on your own.

1 more reply

rsanek1mo ago

I'm confused. If you have detailed, specific expectations, why aren't using the best model available? Even if you were using Opus 4.7, I would inquire if you're using high/xhigh effort by default.

Feels crazy to me for people to use anything other than the best available.

2 more replies

linsomniac1mo ago

>Then hand over to Claude Sonnet.

If code is harder to read than to write, you're doing yourself a disservice by having the output stage not be top shelf.

1 more reply

jwpapi1mo ago

I have the same feeling.

Like there is no way in world that Gen AI is faster then an actual cracked coder shooting the exact bash/sql commands he needs to explore and writing a proper intent-communicating abstraction.

I’m thinking the difference is in order of magnitudes.

On top of that it adds context loss, risk of distraction, the extra work of reading after the job is done + you’ll have less of a mental model no matter how good you read, because active > passive.

Man it was really the weirdest thing that Claude Coded started hiding more and more changes. Thats what you need, staying closely on the loop.

eweise1mo ago

hirvi741mo ago

throwaway77831mo ago

I don't know. I don't write detailed specs, but make it very iterative, with two sessions. One for coding and one for reviews at various levels.

Just the coding window makes mistakes, duplicates code, does not follow the patterns. The reviewer catches most of this, and the coder fixes them all after rationalizing them.

Works pretty well for me. This model is somewhat institutionalized in my company as well.

I use CC Opus 4.7 or Codex GPT 5.4 High (more and more codex off late).

meroes1mo ago

Maybe it was Timothy Gowers who commented on this.

Like you don’t always see how a mathematician came up with some move or object to “try”, and to an LLM it appears random large creative leaps are the way to write proofs.

varispeed1mo ago

You can quickly get something "working" until you realise it has a ton of subtle bugs that make it unusable in the long run.

You then spend months cleaning it up.

Could just have written it by hand from scratch in the same amount of time.

But the benefit is not having to type code.

abustamam1mo ago

This may be worth trying out.

baranul1mo ago

Now that there is Claw Code[1], seems like many of these cancellations are easier to do.

[1]: https://github.com/ultraworkers/claw-code

arikrahman1mo ago

I use open spec to negotiate requirements before the handoff, it's helped me a lot. You could also use GSD2 or Amazon's Kiro, or Spec Kit but I find they have too many stages and waste tokens.

CamperBob21mo ago

Then hand over to Claude Sonnet.

Well, there's your problem. Why aren't you using the best tool for the job?

moribunda1mo ago

And it leaves 25 TODO comments in code silently, reporting to you that everything is done.

dannersy1mo ago

Beautifully stated and I couldn't agree more. This is my experience.

GoToRO1mo ago

you are holding it wrong. For real this time.

rob1mo ago

Just saying that I know a lot of people like to raw dog it and say plugins and skills and other things aren't necessary, but in my case I've had good success with this.

tengbretson1mo ago

> or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed)

Dude! The amount of ad-hoc, interface-specific DTOs that LLM coding agents define drives me up the wall. Just use the damn domain models!

xpe1mo ago

The last two paragraphs, however, show what happens when people start trying to use inductive reasoning -- and that part is really hard: ...

> Therefore I need more time and effort with Gen AI than I needed before because I need to read a lot of code, understand it and ensure it adheres to what mental model I have.

We have a range of options in front of us:

    A. sharing our experience with others
    B. adapting
    C. voting with your feet (cancelling a subscription)
    D. building alternatives to compete
    E. organizing at various levels to push back

(A) might start by sounding like venting. Done well it progresses into clearer understanding and hopefully even community building towards action plans: [1]

One subscription cancellation is a start (if you actually have better alternative and that alternative being better off for the world ... which is debatable given the current set of alternatives!)

Here's what I try to do (but fail often): Do the root cause analysis, vent if you need to, and then think about what is needed to really fix it.

[2]: https://en.wikipedia.org/wiki/Parable_of_the_drowning_man

[3]: The first four are:

    I write detailed specs. Multifile with example code. In markdown.

    Then hand over to Claude Sonnet.

    With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.

    So turns out that I'm not writing code but I'm reading lots of code.

janwillemb1mo ago· 13 in thread

jjfoooo41mo ago

1 more reply

SwellJoe1mo ago

At least some of the investors in this tech are hoping for a monopoly position. They'd like to outspend the competition to get an insurmountable lead, at which point they can set their price.

GaryBluto1mo ago

Luckily local AI is becoming more feasible every day.

8 more replies

fortyseven1mo ago

This is why, despite enjoying all of this, I really want to focus on locally hosted models. If we don't host the technology ourselves, we're setting ourselves up for a hard fall down the line.

Until very recently, local models been little more than brittle toys in my experience, if you're trying to use them for coding.

gip1mo ago

True. That is why it is key important to have open source and sovereign models that will be accessible to all and always on / local.

Competition (OpenAI vs Anthropic is fun to watch) and open source will get us there soon I think.

tetha1mo ago

The owner rug-pulls, or Broadcom buys the owner and starts squeezing.

pmarsh1mo ago

For the sake of argument if you build on AWS is that any more of a solid foundation? You're beholden to Amazon, unless you have the bandwidth to be able to DR immediately to another provider.

1 more reply

blueone1mo ago

Anthropic sells due to unrelenting pressure and unachievable demand > new owner cuts costs > models become worse > new owner sells > the capitalistic cycle wins > we, the people, suffer

sdevonoes1mo ago

The sooner you cancel the sooner you become independent of them

1 more reply

_the_inflator1mo ago

“In the future there might be the possibility that catastrophic event A could happen.”

Not the best argument.

Also there is nothing without dependencies. Loose coupling means coupling.

agumonkey1mo ago

Some people are so dependent on it they can't even say it without twisting words to hide the fact that they're now stuck at zero

notjes1mo ago

Soon, a dented toaster will be enough to run decent models.

2ndorderthought1mo ago

Imagine if anthropic and openai went bankrupt in the next 2 years. If you look at their financials its a real possibility.

zkmon1mo ago· 8 in thread

syhol1mo ago

https://pi.dev/ seems popular, whats not open source about opencode? The repo has an MIT License.

4 more replies

SyneRyder1mo ago

Probably a silly idea, but I'll throw it into the mix - have your current AI build one for you. You can have exactly the coding agent you want, especially if you're looking for "extremely simple".

appcustodian21mo ago

Just in case it didn't occur to you already, you can just build whatever coding agent you want. They're pretty simple

jedisct11mo ago

Swival is not bloated and was specifically made for local agents: https://swival.dev

1 more reply

enraged_camel1mo ago

I use both Cursor and Claude Code, and yes, the latter is noticeably slower with the same model at the same settings.

However, it's hard to justify Cursor's cost. My bill was $1,500/mo at one point, which is what encouraged me to give CC a try.

btbuildem1mo ago

You'd figure by now we would have something between a TUI and an IDE.

btbuildem1mo ago

You can run CC with local models, it's pretty straightforward. I've done this with vLLM + a thin shim to change the endpoint syntax.

banditelol1mo ago

what model you used with llama_cpp?

1 more reply

rectang1mo ago· 7 in thread

Retr0id1mo ago

(but I guess they're not really conflicting, if the "solution" involves upgrading to a higher plan)

3 more replies

llm_nerd1mo ago

I have Max 5x and use only Claude Opus on xhigh mode. I don't use agents, or even MCPs, and stick to Claude Code.

I find it incredibly difficult to saturate my usage. I'm ending the average week at 30-ish percentage, despite this thing doing an enormous amount of work for (with?) me.

3 more replies

raincole1mo ago

> the day when LLM-assisted coding is commoditized

Like yesterday? LLM-assisted coding is $100/mo. It looks very commoditized when most houses in developed world pay more for electricity than that.

5 more replies

taytus1mo ago

I'd recommend Kimi k2.6 for your use. It is an excellent model at a fraction of the cost, and you can use Claude Code with it.

I did a 1:1 map of all my Claude Code skills, and it feels like I never left Opus.

Super happy with the results.

4 more replies

dboreham1mo ago

Same. Never hit a limit. Use it heavily for real work. Never even thought of firing off an LLM for hours of...something. Seems like a recipe for wasting my time figuring out what it did and why.

goalieca1mo ago

cyanydeez1mo ago

It does seem like the sweet spot between WallE and the destroyed earth in WallE.

2 more replies

drunken_thor1mo ago· 7 in thread

jedberg1mo ago

People said this about AWS too. "Why would they save you money??". It turns out that every time they reduce prices, they make more money, because more people use their services.

2 more replies

minimaxir1mo ago

estimator72921mo ago

Less tokens = more free capacity = more subscription income.

GodelNumbering1mo ago

y42OP1mo ago

It's like dating apps. They don't want you to find a good match, because then you cancel the subscription.

1 more reply

nananana91mo ago

Up to a point. There is incentive when they get to the point where they literally can't serve their userbase and customers start leaving.

zzzeek1mo ago

Well that's why threads like this are important to upvote. On hacker news , they're angry !

wilbur_whateley1mo ago· 6 in thread

Claude with Sonnet medium effort just used 100% of my session limit, some extra dollars, thought for 53 minutes, and said:

API Error: Claude's response exceeded the 32000 output token maximum. To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.

amarcheschi1mo ago

And on the seventh day, API Error: Claude's response exceeded the 32000 output token maximum

1 more reply

couchdb_ouchdb1mo ago

I don't think i'd let it think more than 5 minutes without killing the process.

1 more reply

jasonlotito1mo ago

Just curious, what version of Max are you on: 5x or 20x?

2ndorderthought1mo ago

I hope this doesn't come out wrong but. When this happens do agentic/vibe coders message their boss and say "sorry can't work until tomorrow?"

2 more replies

jansenmac1mo ago

giancarlostoro1mo ago

1 more reply

ChicagoDave1mo ago· 4 in thread

I think there’s a clear split amongst GenAI developers.

One group is consistently trying to play whack-a-mole with different models/tools and prompt engineering and has shown a sine-wave of success.

The other group, seemingly made up of architects and Domain-Driven Design adherents has had a straight-line of high productivity and generating clean code, regardless of model and tooling.

I have consistently advised all GenAI developers to align with that second group, but it’s clear many developers insist on the whack-a-mole mentality.

I have even wrapped my advice in https://devarch.ai/ which has codified how I extract a high level of quality code and an ability to manage a complex application.

Anthropic has done some goofy things recently, but they cleaned it up because we all reported issues immediately. I think it’s in their best interests to keep developers happy.

My two cents.

joquarky1mo ago

I kind of wonder if people with ADHD tend to fall into the latter group, as we are used to setting guardrails to keep us aligned to a goal.

1 more reply

camel_Snake1mo ago

FYI that prominent link to your sharpee repo on GitHub 404s

1 more reply

estimator72921mo ago

1 more reply

rglover1mo ago

Dead on. Any company not thinking about this like the 2nd group is setting themselves up for a bad time (and sadly, anecdotally, that seems to be an emerging majority).

1 more reply

anonyfox1mo ago· 3 in thread

fluidcruft1mo ago

My mental model for LLM is I don't expect them to chew gum and walk at the same time. Cleaning code up is a different task from building new functionality.

GLM always feels like it's doing things smarter, until you actually review the code. So you still need the build/prune cycle. That's my experience anyway.

jorjon1mo ago

Can I get that max20 if you are not using it?

cmrdporcupine1mo ago

Most "productive" flow I found was when I had both memberships and let Claude do the "I go yeet your feature" side and Codex do the "WTF bro, that's full of race conditions!" review phase.

But now I just use Codex. Claude is unreliable and leaves data races all over and leaves, as you say, negative conditions unhandled fairly consistently.

bryan01mo ago· 3 in thread

I see a lot of people struggling to work with agents. This post has a good example:

> “you can’t be serious — is this how you fix things? just WORKAROUNDS????”

I will ask them why they made a decision and review alternatives with them. These learnings will aid both you and the agent in the future.

aulin1mo ago

After you see it skip reasoning so many times and saying "actually the simplest fix is" the laziest thing ever you get kind of tired of babysitting it.

causal1mo ago

Even their explanations are often confabulations. Best case they point to something wrong in your prompt or agents files, but usually it’s just noise.

philipwhiuk1mo ago

Like babysitting an intern.

petterroea1mo ago· 3 in thread

Looking at Anthropic's new products I think they understand they don't really have a cutting edge other than the brand.

mmonaghan1mo ago

jetbalsa1mo ago

alex-onecard1mo ago

How are you using kimi 2.6? I am considering their coding plan to replace my claude max 5x but I am worried about privacy and security.

2 more replies

giancarlostoro1mo ago· 3 in thread

I'm debating trying out Codex, from some people I hear its "uncapped" from others I hear they reached limits in short spans of time.

There's also the really obnoxious "trust me bro" documentation update from OpenClaw where they claim Anthropic is allowing OpenClaw usage again, but no official statement?

Dear Anthropic:

Can I get whitelisted for "sane use" of my Claude Code subscription? I would love this. I am not dropping $2400 in credits for something I do for fun in my free time.

fluidcruft1mo ago

dheera1mo ago

Claude Code now has an official telegram plugin and cron jobs and can do 80% of the things people used OpenClaw for if you just give it access to tools and run it with --dangerously-skip-permissions.

2 more replies

scottyah1mo ago

Don't forget, Openclaw was basically bought by OpenAI so there's only incentive to use it as a wedge to pry people off Anthropic.

wood_spirit1mo ago· 2 in thread

hungryhobbit1mo ago

This was a real issue, and Anthropic recently awknowledged it:

https://www.anthropic.com/engineering/april-23-postmortem

I have a hard time seeing any other major AI provider being this transparent, so while I'm annoyed at Claude ... I respect how they handled it.

2 more replies

isoprophlex1mo ago

did you set your 4.7 to xhigh or max effort? anything else is basically not worth your time...

1 more reply

pram1mo ago· 2 in thread

I am certainly not saying people should “spend more money,” more like the Claude Code access in the Pro plan seems kind of like false advertising. Since it’s technically usable, but not really.

swiftcoder1mo ago

> I am certainly not saying people should “spend more money,” more like the Claude Code access in the Pro plan seems kind of like false advertising

Its particularly noticeable when for a long time you could work an 8 hour day in codex on ChatGPT´s $20/month plan (though they too started tightening the screws a couple of weeks back)

thebitguru1mo ago

My guess is that the higher plans will be next, especially as more people upgrade to those and maximize their usage.

mrinterweb1mo ago· 2 in thread

janalsncm1mo ago

https://podcasts.apple.com/us/podcast/this-episode-is-a-cogn...

As someone who both uses and builds this technology I think this is a core UX issue we’re going to be improving for a while. At times it really feels like a choose 2+ of: slow, bad, and expensive.

hu31mo ago

About slowdowns... I have this theory that if they sneak some sleep(1) calls while processing medium to complex prompts they can serve more clients.

But I think "context switching" between 2 different prompts might be too expensive for GPUs to be worth it for LLM providers. Who knows.

lawrence11mo ago· 2 in thread

The timeline doesn't make any sense. How can you subscribe a couple weeks ago and the problem start 3 weeks ago and yet things also went well for the first few weeks. was this written by GPT 5.5?

wg01mo ago

The author is not a native English speaker it seems.

They might mean "few weeks ago" and the phrase "couple of weeks ago" might not be exactly as "Vor ein paar Wochen" in their mind rather could be as "few weeks ago."

Rest of the prose in the article seems to support the assumption.

The post is handwritten with no LLMs involved.

fortynights1mo ago

vondur1mo ago· 2 in thread

Wait, weren't there posts in the not too distant past where everyone was signing the praises for Claude and wondering how OpenAI will catch up?

swader9991mo ago

Yep. I think the sentiment here isn't lagging too much in terms of the day to day experience of what is being offered. Kind of makes HN very useful in this regard.

cyanydeez1mo ago

Wait, are SaaS's fundamentally shifting business models searching to maximize the value of a product at the expense of a customer over time?

Strange how things can change!

1 more reply

zendarr1mo ago· 2 in thread

Seems like some of the token issues may be corrected now

https://www.anthropic.com/engineering/april-23-postmortem

minimaxir1mo ago

1 more reply

giancarlostoro1mo ago

Dear Anthropic:

Please, for the love of all things holy, NEVER change someone's defaults without INFORMING the end user first, because you will wind up with people confused, upset, and leaving your service.

nikolay1mo ago· 2 in thread

Capricorn24811mo ago

> I can agree. ChatGPT 5.5 made this a no-brainer choice

2 more replies

robotnikman1mo ago

>removing Claude Code from the Pro plan

Wait really? I wanted to give it a try, but for $200 a month no way am I paying that for something I just want to experiment around with

2 more replies

caycep1mo ago· 2 in thread

If all Claude does is automate mundane code, why not just make a "meta library" of said common mundane code snippets?

twobitshifter1mo ago

maybe make it so that when you start typing it completes the snippet?

queuebert1mo ago

Like Stack Overflow?

1 more reply

areoform1mo ago· 1 in thread

8organicbits1mo ago

I'm not sure that graph shows a time-based correlation. The 60% line stays inside the 95% confidence interval. Is that not just a measurement of noise?

siliconc0w1mo ago· 1 in thread

Shameless self plug but also worried about the silent quality regressions, I started building a tool to track coding agent performance over time.. https://github.com/s1liconcow/repogauge

Here is a sample report that tries out the cheaper models + the newest Kimi2.6 model against the 5.4 'gold' testcases from the repo: https://repogauge.org/sample_report.

conception1mo ago

This is cool - just wanted to note https://marginlab.ai is one that has been around for a while.

1 more reply

DeathArrow1mo ago· 1 in thread

I use Claude Code with GLM, Kimi and MiniMax models. :)

I was worried about Anthropic models quality varying and about Anthropic jacking up prices.

I don't think Claude Code is the best agent orchestrator and harness in existence but it's most widely supported by plugins and skills.

droidjj1mo ago

Where are you getting inference from? I'm overwhelmed by the options at the moment.

2 more replies

varispeed1mo ago· 1 in thread

dswalter1mo ago

algoth11mo ago· 1 in thread

Doesn't "poor support" implies that there is some sort of support? Shouldnt it be "no support"

ipaddr1mo ago

You get to talk to an AI agent

torstenvl1mo ago· 1 in thread

I feel like almost everyone using AI for support systems is utterly failing at the same incredibly obvious place.

There are three broad categories of interaction: cranks, grandmas, and wtfs.

dboreham1mo ago

joozio1mo ago· 1 in thread

y42OP1mo ago

I like your blog and I can totally relate to this article - it's like something I wanted to write about for a couple of weeks now. :D

https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience...

easythrees1mo ago· 1 in thread

I have to say, this has been the opposite of my experience. If anything, I have moved over more work from ChatGPT to Claude.

kleene_op1mo ago

Same. I am getting crazy good value from Claude at work, on both scientific applications and deployment environments.

People who just let Claude roam free on their repository deserve everything they end up with.

dostick1mo ago· 1 in thread

adamors1mo ago

It’s not ommited, OP clearly talks about editing Javascript.

0xchamin1mo ago· 1 in thread

olcay_1mo ago

Claude Code has something about picking sensible choices instead of asking questions in the system prompt, that's probably the problem.

lawrence11mo ago· 1 in thread

The timeline of the first few sentences doesn't add up. how can you subscribe 2 weeks ago when the problem started 3 weeks ago.

y42OP1mo ago

https://news.ycombinator.com/item?id=47894155

(I am just learning that "a couple of weeks" apparently means "2 weeks"...)

chaosprint1mo ago· 1 in thread

jryio1mo ago

I find it strange that you've anthropomorphized Claude but not ChatGPT seemingly based on one having a human name and the other not

aleqs1mo ago· 1 in thread

unshavedyak1mo ago

Heck two weeks ago i tried my hardest to hit my limit just to make use of my subscription (i sometimes feel like i'm wasting it), and i still only managed to get to 80% for the week.

No idea why everyone seems to be having such wildly different experiences on token usage.

1 more reply

binyu1mo ago· 1 in thread

hungryhobbit1mo ago

https://www.anthropic.com/engineering/april-23-postmortem

1 more reply

yalogin1mo ago· 1 in thread

janalsncm1mo ago

Opencode is a great cli for driving a coding agents.

Like 3 weeks ago Qwen3-coder was the best coding LLM to run locally. I haven’t spent time since to figure out if anything is better.

You can also power Opencode with OpenRouter which lets you pay for any LLM à la carte.

1 more reply

nickdothutton1mo ago· 1 in thread

Switched to local models after quality dropped off a cliff and token consumption seemed to double. Having some success with Qwen+Crush and have been more productive.

tfrancisl1mo ago

Would love some more info on how you got any local model working with Crush. Love charmbracelet but the docs are all over the place on linking into arbitrary APIs.

1 more reply

kashunstva1mo ago

I’m sympathetic to the author’s complaints about Anthropic’s support, though I would go further. It doesn’t exist.

A year gives one a long time to think about things. Maybe it was because I was on a VPN temporarily. Otherwise, no clue. I’m a hobbyist embedded developer. That’s it.

So no, Anthropic support isn’t just poor; it’s nonexistent.

lukaslalinsky1mo ago

cbg01mo ago

stldev1mo ago

Same, after being a long-time proponent too.

First was the CC adaptive thinking change, then 4.7. Even with `/effort max` and keeping under 20% of 1M context, the quality degradation is obvious.

I don't understand their strategy here.

1 more reply

binaryturtle1mo ago

I have a simple rule: I won't pay for that stuff. First they steal all my work to feed into those models, afterwards I shall pay for it? No way!

I use AI, but only what is free-of-charge, and if that doesn't cut it, I just do it like in the good old times, by using my own brain.

1 more reply

bauerd1mo ago

They can't afford to care about individual customers because enterprise demand exploded and they're short on compute

stan_kirdey1mo ago

vintagedave1mo ago

They won't even reset usage for me: https://news.ycombinator.com/item?id=47892445

And by crikey do I empathise with the poor support in this article. Nothing has soured me on Anthropic more than their attitude.

Great AI engineers. Questionable command line engineers (but highly successful.) Downright awful to their customers.

lanthissa1mo ago

for all the drama, its pretty clear both openai, google, and anthropic have had to degrade some of their products because of a lack of supply.

There's really no immediate solution to this other than letting the price float or limiting users as capacity is built out this gets better.

isjcjwjdkwjxk1mo ago

Oh no, the unreliable product people pretend is the next coming of Jesus turned out to be thoroughly unreliable. Who coulda thunk it.

PeterStuer1mo ago

All mostly mitigatable by rigorous audits and steering, but man, it should not have to be.

taffydavid1mo ago

I know this thread is likely full of similar anecdotes, but I also want to share.

My experience very suddenly and very clearly degraded over the last few days.

aucisson_masque1mo ago

First ever time I used ai to code was a week ago, went with the Claude pro because I didn't want to commit.

The 20$ plan has incredible value but also, the limit are just way too tight.

I'm glad Claude made me discover the strength of ai, but now it's time to poke around and see what is more customer friendly. I found deepseek V4 to be extremely cheap and also just as good.

Plus I get the benefit to use it in vs code instead of using Claude proprietary app.

I think that when people goes over the hype and social pressure, anthropic will lose quite a lot of customer.

datavirtue1mo ago

I'm an executive, the devs complaining are getting retrained or put on the chopping block.

I'm not waiting on people to write code any more. No way in hell.

tamimio1mo ago

vivin1mo ago

zulban1mo ago

Curious. Not my experience whatsoever.

I tried Claude recently and it was able to one-shot fixes on 9/9 of the bugs I gave it on my large and older Unity C# project. Only 2/9 needed minor tweaks for personal style (functionally the same).

The author describes what sounds like very large tasks that I'd never hand off to an AI to run wild in 2026.

Anyway I thought I'd give a different perspective than this thread.

arikrahman1mo ago

burnJS1mo ago

My experience is Claude and others are good at writing methods and smaller because you can dictate what it should do in less tokens and easily read the code. This closes the feadback loop for me.

Other times I accept its output as a rough draft and then tell it how to refactor its code from mid to senior level.

I'm sure it will get better but this is my trust level with it. It saves me time within these confines.

Edit: it is a valuable code reviewer for me, especially as a solo stealth startup.

throwaway20271mo ago

duxup1mo ago

I’ve definitely encountered a drop in Claude quality.

Even a simple prompt focused on two files I told Claude to do a thing to file A and not change file B (we were using it as a reference).

Claude’s plan was to not touch file B.

First thing it did was alter file B. Astonishing simple task and total failure.

It was all of one prompt, simple task, it failed outright.

I also had it declare that some function did not have a default value and then explain what the fun does and how it defaults to a specific value….

Fundamentally absurd failures that have seriously impacted my level of trust with Claude.

brunooliv1mo ago

For work, unlimited usage via Bedrock.

Yes I’d like to get more usage out of my personal sub, but at 20/mo no complains

airbreather1mo ago

I am sort of in the same place, it seems to have lost enough of the magic that I might be better trying to do more with running local LLMs on my 4090.

Better the devil you know and all that.

wslh1mo ago

shevy-java1mo ago

Those AI using software developers begin to show signs of addiction:

From "yay, claude is awesome" to "damn, it sucks". This is like with withdrawal symptoms now.

My approach is much easier: I'll stay the oldschool way, avoid AI and come up with other solutions. I am definitely slower, but I reason that the quality FOR other humans will be better.

rurban1mo ago

That's bad for him, because he already had a cheap plan. Now he wont get it back that easy.

gverrilla1mo ago

estimator72921mo ago

I just noticed today that it doesn't warn about approaching limits and just blows straight into billing extra tokens.

I'm pretty sure it used to warn when you got close to your 5hr limit, but no, it happily billed extra usage. Granted only about $10 today, but over the span of like 45 minutes. Not super pleased.

erikbye1mo ago

kx_x1mo ago

After the fixes in Claude Code, Opus 4.6/4.7 have been performing well.

Before the fixes, they were complete trash and I was ready to cancel this month.

Now, I'm feeling like the AI wars are back -- GPT 5.5 and Opus 4.7 are both really good. I'm no longer feeling like we're using nerfed models (knock on wood)!

ForOldHack1mo ago

AI used to be, the punched card replicator... its all replaceable.

exabrial1mo ago

It's bad, really bad.

The filesystem tool cannot edit xml files with <name></name> elements in it

hybrid_study1mo ago

Sometimes it feels like Anthropic uses token processing as a throttling tool, to their advantage.

elevaet1mo ago

jp00011mo ago

_pdp_1mo ago

josefritzishere1mo ago

AI has a lot of future potential but at every level... it's still not very good. And certainly not good enough to validate the expense, let alone what the actual cost would be were it profitable.

hedgehog1mo ago

I used Opus via Copilot until December and then largely switched over to Claude Code. I'm not sure what the difference is but I haven't seen any of these issues in daily use.

mattas1mo ago

I've see a post like this every week for the last 2 years. Are these models actually getting worse? Or do folks start noticing the cracks as they use them more and more?

sreekanth8501mo ago

Biggest issue i see is, models are not getting efficient. This is no where going to get commoditised. There will be a limit at which you can burn money at subsidised cost.

brachkow1mo ago

As many others I had negative (not good as before) feeling about Claude Code lately

What I don't understand is these loud "voting with money" comments. What they are canceling is very subsidized plan to buy something that delivers a lot of value.

There are only two providers that can provide this level of models at very subsidized price - anthropic and openai. Both of them are bad in terms of reliability.

So I wonder what these people do after they "cancel" both of them? Do they see producing less result at same hourly rate as everyone else on the market as viable option?

sfmike1mo ago

i ran prompts used up a ton of usage, and got no return just showed error.

Asked support hey i got nothing back i tried prompting several times used a ton of usage and it gave no response. I'd just like usage back. What I payed for I never got.

Just bot response we don't do refunds no exceptions. Even in the case they don't serve you what your plan should give you.

Animats1mo ago

Support? You expected support? Live support?

Most of this is about the billing system, which is apparently broken.

kissgyorgy1mo ago

I cancelled in the minute my subscription stopped working in Pi. Not going back to the slopfest what Claude Code is.

chadleriv1mo ago

Off topic: I do feel like this model switching content feels very circa 2010 "I'm quitting Facebook"

smashah1mo ago

Did the same with Google Ai Ultra. They rug pulled the subscribers. They changed the deal, we cancel. Simple.

Fabulu1mo ago

It's a client issue. It's a mystery to me why they don't fix it.

SwellJoe1mo ago

3 more replies

AJRF1mo ago

We are in the 'we need to IPO so screw our customers' phase of the cycle

captainregex1mo ago

anyone remember the whole “delete uber” thing from 2017ish? good times

r0fl1mo ago

I hope codex doesn’t decline the same way

I’m blown away by how good it is lately

bad_haircut721mo ago

Waiting 60s every time I send a msg really kills the ux of claude

dannypostma1mo ago

When I saw the German screenshot it all made sense to me.

moralestapia1mo ago

The midwit curve of LLMs has OpenAI on both ends.

whalesalad1mo ago

Then within the last few months everything changed and went to shit. My trust was lost. Behavior became completely inconsistent.

1 more reply

postepowanieadm1mo ago

Yeah, session limits are kinda show stoppers.

zh_code1mo ago

I just cancelled my Max20 plan yesterday.

spaceman_20201mo ago

4.7 is the breaking point for me

It's almost unusable

rvz1mo ago

The great de-skilling programme continues in Anthropic's casino. They completely want you dependent on gambling tokens on their slot machines with extortionate prices, fees and limits.

Once again, local models are the answer and Anthropic continues to get you addicted to their casino instead of running your own cheaper slot machine, which you save your money.

Every time you go to Anthropic's casino, the house always wins.

r00t-1mo ago

Same, it's a mess.

danjl1mo ago

This sounds just like all my neighbors complaining about their internet provider.

system21mo ago

I hate enshittification and I hate seeing this happening to Claude Code right now.

1 more reply

gizmodo591mo ago

queuebert1mo ago

drivebyhooting1mo ago

Imagine vibe coding your core consumer application and associated backend…

Oh wait, I don’t have to imagine. That’s what Anthropic does. A nice preview for what is in store for those who chose to turn off their brains and turn on their AI agents.

scuff3d1mo ago

Welcome to the future. Anthropic is currently speed running it but this is what all LLM tools are going to look like in the next few years, once they turn the enshitification corner.

docheinestages1mo ago

Me too.

johanneskanybal1mo ago

It's not magic but for me definitly claude is the way to go. Not expecting magic it's just another level of non-slop than the rest I've tried.

GrumpyGoblin1mo ago

Cool

fractalf1mo ago

Hehe me too. Yesterday. Enough is enough. Using KLM5.1 and soon deepseek

gexla1mo ago

We can't do it. We standardized. They got us.

semiinfinitely1mo ago

absolute garbage support was the reason why I canceled. who would have thought that an AI company has only bots as support agents

jwaldrip1mo ago

6 more replies

j / k navigate · click thread line to collapse