undefined | Better HN

0 pointsgenewitch2mo ago0 comments

sounds like the "thinking tokens" are a mechanism to extract more money from users?

0 comments

Anecdotally but it definitely feels like in the last couple weeks CC tends to be more aggressive at pulling in significantly larger chunks of an existing code base - even for some simple queries I'll see it easily ramp up to 50-60k token usage.

troyvit2mo ago

This really speaks to the need to separate the LLM you use and the coding tool that uses it. LLM makers utilizing the SaaS model make money on the tokens you spend whether or not they need them. Tools like aider and opencode (each in their own way) use separate tools build a map of the codebase that they can use to work with code using fewer tokens. When I see posts like this I start to understand why Anthropic now blocks opencode.

We're about to get Claude Code for work and I'm sad about it. There are more efficient ways to do the job.

ayewo2mo ago

When you state it like that, I now totally understand why Anthropic have a strong incentive to kick out OpenCode.

OpenCode is incentivized to make a good product that uses your token budget efficiently since it allows you to seamlessly switch between different models.

Anthropic as a model provider on the other hand, is incentivized to exhaust your token budget to keep you hooked. You'll be forced to wait when your usage limits are reached, or pay up for a higher plan if you can't wait to get your fix.

CC, specifically Opus 4.5, is an incredible tool, but Anthropic is handling its distribution the way a drug dealer would.

3 more replies

genewitchOP2mo ago

I'm curious if anyone has logged the number of thinking tokens over time. My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

they get to see (if not opted-out) your context, idea, source code, etc. and in return you give them $220 and they give you back "out of tokens"

throwup2382mo ago

> My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

It's also a way to improve performance on the things their customers care about. I'm not paying Anthropic more than I do for car insurance every month because I want to pinch ~~pennies~~ tokens, I do it because I can finally offload a ton of tedious work on Opus 4.5 without hand holding it and reviewing every line.

The subscription is already such a great value over paying by the token, they've got plenty of space to find the right balance.

NitpickLawyer2mo ago

> My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

I've done RL training on small local models, and there's a strong correlation between length of response and accuracy. The more they churn tokens, the better the end result gets.

I actually think that the hyper-scalers would prefer to serve shorter answers. A token generated at 1k ctx length is cheaper to serve than one at 10k context, and way way cheaper than one at 100k context.

1 more reply

jumploops2mo ago

I believe Claude Code recently turned on max reasoning for all requests. Previously you’d have to set it manually or use the word “ultrathink”

vidarh2mo ago

It's absolutely a work-around in part, but use sub-agents, have the top level pass in the data, and limit the tool use for the sub-agent (the front matter can specify allowed tools) so it can't read more.

(And once you've done that, also consider whether a given task can be achieved with a dumber model - I've had good luck switching some of my sub-agents to Haiku).

behnamoh2mo ago

> more aggressive at pulling in significantly larger chunks of an existing code base

They need more training data, and with people moving on to OpenCode/Codex, they wanna extract as much data from their current users as possible.

arthurcolle2mo ago

Their system prompt + MCP is more of the culprit here. 16 tools, sophisticated parameters, you're looking at 24K tokens minimum

behnamoh2mo ago

probably, because they recently said the ultrathink is enabled by default now.

genewitchOP2mo ago

does this translate into "the end-user's cost goes up"

by default?

mystraline2mo ago

Its the clanker version of the "Check Wallet Light" (check engine light).

j / k navigate · click thread line to collapse

0 comments

vunderba2mo ago

troyvit2mo ago

We're about to get Claude Code for work and I'm sad about it. There are more efficient ways to do the job.

ayewo2mo ago

When you state it like that, I now totally understand why Anthropic have a strong incentive to kick out OpenCode.

OpenCode is incentivized to make a good product that uses your token budget efficiently since it allows you to seamlessly switch between different models.

CC, specifically Opus 4.5, is an incredible tool, but Anthropic is handling its distribution the way a drug dealer would.

3 more replies

genewitchOP2mo ago

they get to see (if not opted-out) your context, idea, source code, etc. and in return you give them $220 and they give you back "out of tokens"

throwup2382mo ago

> My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

The subscription is already such a great value over paying by the token, they've got plenty of space to find the right balance.

NitpickLawyer2mo ago

> My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

I've done RL training on small local models, and there's a strong correlation between length of response and accuracy. The more they churn tokens, the better the end result gets.

1 more reply

jumploops2mo ago

I believe Claude Code recently turned on max reasoning for all requests. Previously you’d have to set it manually or use the word “ultrathink”

vidarh2mo ago

(And once you've done that, also consider whether a given task can be achieved with a dumber model - I've had good luck switching some of my sub-agents to Haiku).

behnamoh2mo ago

> more aggressive at pulling in significantly larger chunks of an existing code base

They need more training data, and with people moving on to OpenCode/Codex, they wanna extract as much data from their current users as possible.

arthurcolle2mo ago

Their system prompt + MCP is more of the culprit here. 16 tools, sophisticated parameters, you're looking at 24K tokens minimum

behnamoh2mo ago

probably, because they recently said the ultrathink is enabled by default now.

genewitchOP2mo ago

does this translate into "the end-user's cost goes up"

by default?

mystraline2mo ago

Its the clanker version of the "Check Wallet Light" (check engine light).

j / k navigate · click thread line to collapse