The lack of transparency and accountability behind all of this is incredible in my perception.
I emailed their support a few days ago with details, concerns, a link to the twitter thread from one of their employees, and a concrete support request, which had an AI agent ('Fin') tell me:
> While our Support team is unable to manually reset or work around usage limits, you can learn about best practices here. If you’ve hit a message limit, you’ll need to wait until the reset time, or you can consider purchasing an upgraded plan (if applicable).
I replied saying that was not an appropriate answer.
You're absolutely right re the lack of transparency and accountability. On one hand, Anthropic generates good will by appearing to have a more ethical stance then OpenAI, and a better product. On the other hand, they kill it fast through extremely poor treatment of their customers.
If they have a bug, they need to resolve it: and in the meantime refund quotas. 'Unable to' - that's shocking. This is simple and reasonable. It's basic customer service. I don't know if they realise the damage their attitude is doing.
This does mean ultimately no loyalty. I can't stay loyal to a brand that doesn't actually respond to inquiries, bug reports or down reports at all.
I do understand that Anthropic is operating at a tremendous scale and can't have enough humans in the loop. This sounds like a good use for ai classification and triage, really!
Amen to this.
Being in business means having to respond to customer enquiries at some point.
Given the amount of billions being pumped into Anthropic's pockets and given the millions their senior-leadership no doubt pay themselves, I'm sure they could spare a bit of cash to get off their backsides and sort out the Customer Service.
I simply do not buy the "poor Antropic, they are operating at scale, they are too busy winning to deal with customer service" argument that comes up time and time again.
The fact is there are many large businesses, many large governments that are able to deal with customers "at scale".
Scale means you respond a bit slower, maybe a few days or at most a couple of weeks AT MOST. But complete silence for months or years is inexcusable.
All of my experiences with "Fin" matches that of my friends and colleagues .... namely that "Fin" is a synonym for "black hole". I've got "tickets" opened with "Fin" months ago that have not had a modicum of reply.
My organization has the concept of "premium models" where our limits reset every month. I hit my limit pretty quickly last month because I was burning tokens doing things that would have been a simple bash loop in the past - all because I was used to interfacing with Claude at the chat layer for all my automation needs and not thinking any more about it.
Completely outside of the productivity debate, offloading cognitive tasks to LLMs leaves you less practiced in them and less ready to do them when the LLM isn't available. When you have to delegate only certain tasks to the LLM for financial reasons, you may find yourself very frustrated.
or do you think model inference/training will get cheap that we won't reach the point of "high prices"?
What makes it worse is the lack of transparency. If there were clear, hard limits, people could plan around it. Instead it’s this moving target that makes it impossible to trust for real work.
At some point it stops feeling like a bug and starts feeling like a pricing experiment on users.
The only way out is government regulation which means we are screwed in the US (our government is too far gone to represent average citizen interests in any meaningful way) but Europeans maybe have a chance if they get it together and demand change.
Dependency on cloud AI models is, in effect, dependency on VC subsidy. From the user's point of view, this dependency is debt which will either be repaid with interest to a model provider or through the hard work of making themselves independent of such models after having become dependent.
Could just be that usage has gone up.
One reddit user reverse engineered the binary and found that it was a cache invalidation issue.
They are doing some hidden string replacement if the claude code conversation talks about billing or tokens. Looks like that invalidates the cache at that point.
If that string appears anywhere in the conversation history, I think the starting text is replaced, your entire cache rebuilds from scratch.
So, nothing devious, just a bug.
I use it with an api key, so I can use /cost. When I did a resume, it showed the cost from what I thought was first go. I don't think it's clear what the difference is between api key and subscription, but am I believe that simply resuming cost me $5? The UI really make it look like that was the original $5.
While it should be fixed, this isn't the same usage issue everyone is complaining about.
For ex. of the five new data centers being planned in Wisconsin, the two I know of that have public energy consumption estimates will need more electricity than all of the residential electric usage in Wisconsin combined at 3.9 gigawatts.
https://www.wpr.org/news/data-centers-could-cost-wisconsins-...
You just work till suddenly the AI dumps you out, and sit there wondering how many hours or days you have to wait. It's incredible that this experience is at all ok, is accepted
Also: sub agents do not get you free usage. They just protect your main context window.
That put me at 12%.
I have no MCPs except the built in claude-in-chrome.
This is clearly a bug.
Its a "bug" because its probably an intended effect of capturing the costs of compute but surfacing a fact that they oversold compute to a situations where they cant keep the KV cache hot and now its thrashing.
We have decades that say this. Few businesses survive by low margins on wide audiences.
I have seen more than one comment on this thread mentioning kimi though - I'll have to test it out.
qwen3-coder-next has been surprisingly capable as a local model too - needs to be used to make small changes where you know exactly what the final code should look like rather than implementing whole features, but it is free (except for the power bill).
I have a GLM Code subscription and it lasts much longer than Claude Code.
I use Pi agent so I use all agents in the same harness.
I’ve been tempted to move my Gemini plan up to a higher plan and play around more with the Gemini cli - as I seriously live the Gemini chat for most everything. Claude is lazy af and is always pulling stale data, or not checking resources entirely. I literally have a Gemini mcp that I force close to use half the time when it’s lost, and Gemini nails it every single time.
I’m on a Claude max 20x plan right now, and I seriously can’t imagine not having it around anymore but Gemini seems to always have my back on actual current data and less hallucinations.
Funnily Junie detects when Gemini is overloaded (less and less often now) and switches to default model (openai). This is when you start thinking - why this code is so terrible, whats going on?
Even if you show them benchmarks that show another model equally as good if not better, they refuse to use it. My suspicion is they've convinced themselves that Opus must be the best, because of reputation and price. They might've used a different model and didn't have a good experience, making them double down.
I hope a research institution will perform an experiment. My hypothesis is that if you swapped out a couple similar state-of-the-art models, even changing the "class" of model (Sonnet <-> Opus, GPT 5.4 <-> Sonnet), the user won't be able to tell which is which. This would show that the experience is subjective, and that bias is informing their decision, rather than rationality.
It's like wine tasting experiments. People rate a $100 bottle of wine higher than a $10 bottle. But if they actually taste the same, you should be buying the $10 bottle. But people don't, because they believe the $100 bottle is better. In the AI case, the problem is people won't stop buying the expensive bottle, because they've convinced themselves they must use the more expensive bottle.
It's not like I haven't tried. Gemini CLI is still trash (it's probably a bit better now, but I still can't see the edits it proposes, well, etc.). I tried OpenCode, the whole experience was frustrating: the models give up mid-task, they run rampant with actions, the CLI does not offer the level of control and customization Claude Code offers, etc.
I've also tried the other major tools: Codex, Cursor, Cline, Aider, and others, nothing works for me. You are surprised people stick to Claude Code, I am surprised people bother with the other tools.
Maybe it has something to do with how I use the agentic tools: I use the CLI almost exclusively, rarely using the IDE (unless I want to actually code myself). I also almost always approve each and every edit. As such, my number one concern is for the tool to provide me with proper control in a simple and reliable manner: I want a rich permission system that works, and I want to see each proposed edit very clearly in an ergonomic diff format. I want to be able to type, recall, and edit my commands easily too. These are things Claude Code excels at that the other just don't.
The best I've been able to do is to use third-party routers to enable me use Claude Code with almost-SOTA models, and this is the approach that shows the most promise. I'd hate to be beholden to Anthropic's shenanigans.
Using 4.6 Opus for simple things is not only wasting tokens, it's also slower. Sonnet will get a lot of tasks done in half the time for less than half the money
https://old.reddit.com/r/ClaudeCode/comments/1s7zg7h/investi...
I definitely learned to plan out my projects more using LLM's, but in that case im 80% there. I might hit a roadblock or two, but if that means I don't have to guide an LLM then I'd prefer that.
Just a shockingly constrained service tier right now.
This is a massive shift from my previous experience.
You've hit your limit · resets 2am (America/Los_Angeles)
I waited until the next day to ask it to do it again, and then:
You've hit your limit · resets 1pm (America/Los_Angeles)
At which point I just gave up
There's no other way that these companies can compete against the likes of Google, and Facebook unless they sell themselves to these companies. With AWS and GCP spending hundreds of billions of dollars per year, there's no way that Anthropic or OpenAI can continue competing unless they make an absurd amount of money and throw that at resources like their own datacenters, etc and they can't do that at $20/month.
Without heavy collusion or outright legislative fiat (banning open models) I don’t see how Anthropic/OpenAI justify their (alleged) market caps
I routinely match or beat Claude with regards to speed, I often race it to the solution because Claude just takes so long to produce a usable result.
Staying competitive doesn't mean only paying an AI for slop that often takes longer to produce. AI is a convenience, it is not the only way to produce code or even the most cost effective or fastest way. AI code also comes with more risk, and more cognitive load if you actually read and understand everything it wrote. And if you don't then you're a bit foolish to trust it blindly. Many developers are waking up to the reality of using AI, and it's not really living up to the hype.
Maybe you don't recognize someone with real skill and 30+ years of experience? I don't need Claude, but I'm using it. Sometimes it succeeds at simple tasks, but it's out of its depth for anything complex, and after enough iterations on one task, entropy takes hold.
Maybe your coding career was a dead end job, but mine is doing just fine. I'm also not sure you or your colleagues correctly count the time you spend putting into instructing AI vs what you get out that is actually usable. And if you were slow before AI, then I have to ask why you think learning to be a slop-fixer is somehow better than learning how to be a better software engineer.
The amount of ai slop from diffs and posts is nauseating.
Anthropic has said they are investigating. https://www.reddit.com/r/ClaudeAI/comments/1s7zgj0/investiga...
Claude recently improved Opus 4.6 to have a 1Mtoken context. Cache normally invalidates after 5 minutes.
If you come back or --continue after a break (or 5 minutes), that's a MASSIVE hit to your session limit. 250000 tokens at Max x5 will ding you 10% of your session for "Hi, I'm back".
So say you don't typically do /compact very often. And say you're not very chatty and "do the right thing" by only asking a question once in a while? You'll burn through context like crazy.
Meanwhile if you have ADHD and anthropomorphize the bleep out of your claude and chat with them all day long? Hardly a dent!
This trick seems to work for now (ymmv)
Tell your system to
CronCreate
cron: "*/4 * * * *"
prompt: "heartbeat — no action needed"
And turn it back off at end of day.I'm sure anthropic will be thrilled by this, but I don't have a better solve at this time yet.
Context management is a thing. Unfortunately you're not allowed to use any tool other than claude code with the Anthropic Subscription, so I guess this is the solve they asked for. Allowing people to write their own tools with superior context management would seem to be a no-brainer to me, but what do I know?
* Models will manage tokens more efficiently
* Agents will manage models more efficiently
* Users will manage agents more efficiently
Why are we acting like technology is on pause?
This is the only expected answer. https://forstarters.substack.com/p/for-starters-59-on-credit...
Oddly though, when using at home I'm using Sonnet via the standard chat interface and that, whilst it will produce substandard code in its output is still reasonably capable - even in more niche tasks. Granted though that my personal projects are far simpler than the codebase I handle at work.
Unilaterally changing the deal to give customers less for the same price should not be legal, but companies have slowly boiled the frog in such a way that now we just go "welp, it's corporations, what can you do", and forget that we actually used to have some semblance of justice in the olden days.
TIP (YMMV): I've found that moving the current code base into a new 'project' after a dozen or so turns helps as I suspect the regurgitation of the old conversations chews up tokens.
It seems that anthropic has added something similar to their browser UI because just in the last few days chat has become almost unusable in firefox. %@$#%
Anyways I don't have the knowledge as to how to audit this (claud pro) to confirm what feels like an onboard at any cost business behavior.
Is anyone currently auditing through openrouter/litellm and seeing any poor correlation to the session/weekly limit?
As the tooling matures I think we'll see better support for mixing models — local and cloud, picking the right one for the task. Run the cheap stuff locally, use the expensive cloud models only when you actually need them. That would go a long way toward managing costs.
There's also the dependency risk people aren't talking about enough. These providers can change pricing whenever they want. A tool you've built your entire workflow around can become inaccessible overnight just because the economics shifted. It's the vendor lock-in problem all over again but with less predictability.
If you’re not listening to Ed Zitron you’d better start if you don’t want to get whiplash in the coming months.
i just refuse to use openai/google/anthropic subscriptions, i only use open source models with ZDR tokens.
- i like privacy in my work, and i share when i wish. somehow we accepted that our prompts and work may be read and moderated by employees. would you accept people moderating what you write in excel, google docs, apple pages?
- i want a consistent tool, not something that is quantised one day, slow one day, a different harness one day, stops randomly.
- unless i am missing something, the closed source models are too slow for me to watch what they are doing. i feel comfortable with monitoring something, usually at about 200-300tps on GLM 5. above that it might even be too fast!
If my company pays for it, i do not care.
If i have a hobby project were it is about converting an idea in my spare time in what i want, i'm happily paying 20$. I just did something like this on the weekend over a few hours. I really enjoy having small tools based on single html page with javascript and json as a data store (i ask it to also add an import/export feature so i can literaly edit it in the app and then save it and commit it).
For the main agent i'm waiting for like the one which will read my emails and will have access tos ystems? I would love a local setup but just buying some hardware today costs still a grant and a lot of energy. Its still sign cheaper to just use a subscription.
Not sure what you mean though regarding speed, they are super fast. I do not have a setup at home which can run 200-300 tps.
you can get subscriptions to use the APIs, from synthetic, or ollama, fireworks.
And since I saw a few other comments talking about these, do you have any preference on different cloud providers with ZDR? I look every once in a while and want to switch to completely open models and/or at least ZDR so I can start doing things like summarizing e-mail. I'm thinking I can probably split my use between some sort of cloud api and claude code for heavier tasks.
But if i would use some API stuff, probably openrouter, isn't that easer to switch around and also have zero konwledge savety?
Owning is expensive. Not owning is also expensive.
Energy in germany is at 35 cent/kwh and skyrocketed to 60 when we had the russian problem.
I'm planning to buy a farm and add cheap energy but this investment will still take a little bit of time. Until then, space is sparse.
there are many cloud providers of zero data retention llm APIs, and even cryptographic attestation.
they are not throttled, you can get an agreed rate limit.
Otherwise you should look into running e.g. Qwen3.5-35B-A3B or Qwen3.5-27B on your own computer. They're not Opus-level but from what I've heard they're capable for smaller tasks. llama.cpp works well for inference; it works well on both CPU and GPUs and even split across both if you want.
otherwise check the list of providers on openrouter and you can see the pricing, quantisation, sign up directly rather than via a router. ensure to get caching prices, do not get input/output API prices.
GLM 5 is a frontier model, Kimi 2.5 is similar with vision support, Minimax M2.7 is a very capable model focused on tool calling.
If you need server side web search, you could use the Z AI API directly, again ZDR; or Friendli AI; or just install a search mcp.
For the harness opencode is the normal one, it has subagents and parallel tool calling; or just use claude code by pointing it at the anthropic APIs of various providers like fireworks.
And no, they're not as capable as SOTA models. Not by far.
However they can help reduce your token expenditure a lot by routing them the low-hanging fruit. Summaries, translations, stuff like that.
Considering how much progress I made vs how much I paid, I couldn't make a scientific assessement, but it felt pretty close.
But if I was doing deep coding on pro plan it would have sucked.
You can't expect to use massive context windows for $20
A simple "how do I do x" question used 2% of my budget.
I paid extra and chewed through $5 in a few minutes of analyzing segments of log files.
At this rate it's not worth the trouble of carefully managing usage to avoid ambiguous limits that disrupt my work.
If that's the way it is in order for them to make money, that's fine - but I need a usable tool that I don't have to micromanage. This product is not worth it ($, time) to me at this rate.
I hope it changes because when it works it's a great addition to my tools.
- If I ask Claude to go and build a product idea out for me from scratch, it can get quite far, but then I will hit quota limits on the pro plan ($20pm).
- I have not drunk the Kool-aid and tried to indulge in ClaudeMaxxing (Max plan at $200pm). I need to sleep and touch grass from time to time.
- I don't bother with a Claude.md in my projects. I just raw-dog context.
- If I have a big codebase, and I'm very clear about what code changes I want to make Claude do, I can easily get a lot of changes made without getting near my quota. It's like Mr Miyagi making precision edits to that Bonsai Tree in Karate Kid.
My last bit of advice - use the tool, but don't let the tool use you.
It was a big disappointment and it just burned through tokens so fast that I hit first limit after 30 minutes while it was gathering info on my project and doing websearches.
My experience was that when I wanted to use it, maybe 2-3 days per week, Pro sub was not enough. On some days I did not use it at all. The daily or weekly token limit was really restrictive.
Contrary to the popular opinion here, there are other services beyond Claude Code. These usage limits might even prompt (har har) people to notice that Gemini is cheaper and often better.
Fixed costs, exact model pinning, outage resistant, enshittification resistant, better security, better privacy, etc...
There are just so many compelling reasons to be on-prem instead of dependent on a 3rd party hoovering up all your data and prompts and selling you overpriced tokens (which eventually they MUST be, because these companies have to make a profit at some point).
If the only counterbalance is "well the api is cheaper than buying my own hardware"...
That's a short term problem. Hardware costs are going to drop over time, and capabilities are going to continue improving. It's already pretty insane how good of a model I can run on two old RTX-3090s locally.
Is it as good as modern claude? No. Is it as good as claude was 18 months ago? Yes.
Give it a decade to see companies really push into the "diminishing returns" of scaling and new models... combined with new hardware built with these workloads in mind... and I think on-prem is the pretty clear winner.
1/ https://github.com/google-gemini/gemini-cli/issues?q=is%3Ais...
It might be acceptable for some general tasks, but I haven’t EVER seen it perform well on non trivial programming tasks.
Has that BS stopped?
Oh well.
It's possible some people offload too much to LLMs but personally, my brain is still doing a lot of work even when I'm "vibecoding".
“Can you give me an example of how to read a video file using the Win32 API like it’s 2004?” - me trying to diagnose a windows game crashing under wine
"Thinking is the hardest work there is, which is why so few people do it" — attrib Henry Ford
Now we have tools that can appear to automate your thinking for you. (They don't really think, but they do appear to, so...)
Note the word "any." Like cloud services there will be unique aspects of a tool, but just like cloud svc there is a shared basic value proposition allows for migration from one to another and competition among them. If Gemini or OpenAI or Ollama running locally becomes a better choice, I'll switch without a care.
Subscription sprawl is likely the more pressing issue (just remembered I should stop my GH CoPilot subscription since switching to Claude).
There's many things to worry about but which LLM provider you choose doesn't really lock you in right now.
Input $5 / M tokens Output $25 / M tokens
GPT Codex 5.3:
Input $1.75 / M tokens Output $14 / M tokens
> Claude Code users hitting usage limits 'way faster than expected'
No shit, Sherlock.