undefined | Better HN

0 pointskoverstreet1mo ago0 comments

There's been more going on than just the default to medium level thinking - I'll echo what others are saying, even on high effort there's been a very significant increase in "rush to completion" behavior.

0 comments

bcherny1mo ago

Thanks for the feedback. To make it actionable, would you mind running /bug the next time you see it and posting the feedback id here? That way we can debug and see if there's an issue, or if it's within variance.

JamesSwift1mo ago

  a9284923-141a-434a-bfbb-52de7329861d
  d48d5a68-82cd-4988-b95c-c8c034003cd0
  5c236e02-16ea-42b1-b935-3a6a768e3655
  22e09356-08ce-4b2c-a8fd-596d818b1e8a
  4cb894f7-c3ed-4b8d-86c6-0242200ea333

Amusingly (not really), this is me trying to get sessions to resume to then get feedback ids and it being an absolute chore to get it to give me the commands to resume these conversations but it keeps messing things up: cf764035-0a1d-4c3f-811d-d70e5b1feeef

bcherny1mo ago

Thanks for the feedback IDs — read all 5 transcripts.

On the model behavior: your sessions were sending effort=high on every request (confirmed in telemetry), so this isn't the effort default. The data points at adaptive thinking under-allocating reasoning on certain turns — the specific turns where it fabricated (stripe API version, git SHA suffix, apt package list) had zero reasoning emitted, while the turns with deep reasoning were correct. we're investigating with the model team. interim workaround: CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 forces a fixed reasoning budget instead of letting the model decide per-turn.

nayroclade1mo ago

Hey bcherny, I'm confused as to what's happening here. The linked issue was closed, with you seeming to imply there's no actual problem, people are just misunderstanding the hidden reasoning summaries and the change to the default effort level.

But here you seem to be saying there is a bug, with adaptive reasoning under-allocating. Is this a separate issue from the linked one? If not, wouldn't it help to respond to the linked issue acknowledging a model issue and telling people to disable adaptive reasoning for now? Not everyone is going to be reading comments on HN.

2 more replies

diavelguru1mo ago

Love this. Responding to users. Detail info investigating. Action being taken (at least it seems so).

2 more replies

allisdust1mo ago

I cannot provide the session ids but I have tried the above flag and can confirm this makes a huge amount of difference. You should treat this as bug and make this as the default behavior. Clearly the adaptive thinking is making the model plain stupid and useless. It is time you guys take this seriously and stop messing with the performance with every damn release.

JamesSwift1mo ago

Just set that flag and already getting similar poor results. new one: 93b9f545-716c-4335-b216-bf0c758dff7c

1 more reply

tomaskafka1mo ago

My guess is there isn't enough hardware, so Anthropic is trying to limit how much soup the buffet serve, did I guess right? And I would absolutely bet the enterprise accounts with millions in spend get priority, while the retail will be first to get throttled.

onoesworkacct1mo ago

This kind of thing is harder for regular end-users to understand following the change removing reasoning details.

mangatmodi1mo ago

I am curious. Are you able to see our session text based on the session ID? That was big no in some of the tier-1 places I worked. No employee could see user texts.

1 more reply

gilrain1mo ago

> The data points at adaptive thinking under-allocating reasoning on certain turns

Will you reopen the issue you incorrectly closed, then…? Or are you just playacting concern?

matheusmoreira1mo ago

I just asked Claude to plan out and implement syntactic improvements for my static site generator. I used plan mode with Opus 4.6 max effort. After over half an hour of thinking, it produced a very ad-hoc implementation with needless limitations instead of properly refactoring and rearchitecting things. I had to specifically prompt it in order to get it to do better. This executed at around 3 AM UTC, as far away from peak hours as it gets.

b9cd0319-0cc7-4548-bd8a-3219ede3393a

> You're right to push back. Let me be honest about both questions.

> The @() implementation is ad-hoc

> The current implementation manually emits synthetic tokens — tag, start-attributes, attribute, end-attributes, text, end-interpolation — in sequence.

> This works, but it duplicates what the child lexer already does for #[...], creating two divergent code paths for the same conceptual operation (inline element emission). It also means @() link text can't contain nested inline elements, while #[a(...) text with #[em emphasis]] can.

I just feel like I can't trust it anymore.

koverstreetOP1mo ago

That's pretty much been my day - today was genuinely bad, and I've been putting up with a lot of this lately.

Now on Qwen3.5-27b, and it may not be quite as sharp as Opus was two months ago, but we're getting work done again.

matheusmoreira1mo ago

Literally two weeks ago it was outputting excellent results while working with me on my programming language. I reviewed every line and tried to understand everything it did. It was good. I slowly started trusting it. Now I don't want to let it touch my project again.

It's extremely depressing because this is my hobby and I was having such a blast coding with Claude. I even started trying to use it to pivot to professional work. Now I'm not sure anymore. People who depend on this to make a living must be very angry indeed.

2 more replies

koverstreetOP1mo ago

I'll have a look. The CoT switch you mentioned will help, I'll take a look at that too, but my suspicion is that this isn't a CoT issue - it's a model preference issue.

Comparing Opus vs. Qwen 27b on similar problems, Opus is sharper and more effective at implementation - but will flat out ignore issues and insist "everything is fine" that Qwen is able to spot and demonstrate solid understanding of. Opus understands the issues perfectly well, it just avoids them.

This correlates with what I've observed about the underlying personalities (and you guys put out a paper the other day that shows you guys are starting to understand it in these terms - functionally modeling feelings in models). On the whole Opus is very stable personality wise and an effective thinker, I want to complement you guys on that, and it definitely contrasts with behaviors I've seen from OpenAI. But when I do see Opus miss things that it should get, it seems to be a combination of avoidant tendencies and too much of a push to "just get it done and move into the next task" from RHLF.

necrotic_comp1mo ago

Opus definitely pushes me to ignore problems. I've had to tell it multiple times to be thorough, and we tend to go back and forth a few times every time that happens. :)

pimeys1mo ago

"I see the tests failing, but none of our changes caused this breakage so I will push my changes and ask the user to inform their team on failing tests."

jchanimal1mo ago

One of the thing is we’ve seen at vibes.diy is that if you have a list of jobs and you have agents with specialized profiles and ask them to pick the best job for themselves that can change some of the behavior you described at the end of your post for the better.

freedomben1mo ago

How much of the code/context gets attached in the /bug report?

bcherny1mo ago

When you submit a /bug we get a way to see the contents of the conversation. We don't see anything else in your codebase.

murkt1mo ago

Was there a change in Claude Code system prompt at that time that nudges Claude into simplistic thinking?

Here is a gist that tries to patch the system prompt to make Claude behave better https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

I haven’t personally tried it yet. I do certainly battle Claude quite a lot with “no I don’t want quick-n-easy wrong solution just because it’s two lines of code, I want best solution in the long run”.

If the system prompt indeed prefers laziness in 5:1 ratio, that explains a lot.

I will submit /bug in a few next conversations, when it occurs next.

7 more replies

andoando1mo ago

Isnt the codebase in the context window?

1 more reply

stefan_1mo ago

Theres also been tons of thinking leaking into the actual output. Recently it even added thinking into a code patch it did (a[0] &= ~(1 << 2); // actually let me just rewrite { .. 5 more lines setting a[0] .. }).

taylorfinley1mo ago

I've seen this frequently also

withinboredom1mo ago

I suspect it happens when the model's adaptive thinking was too conservative and it could have thought more, but didn't.

butlike1mo ago

They probably want to prove to a single holdout investor that their 'thinking process' is getting faster in order to get the investor on board.

j / k navigate · click thread line to collapse

0 comments

bcherny1mo ago

JamesSwift1mo ago

  a9284923-141a-434a-bfbb-52de7329861d
  d48d5a68-82cd-4988-b95c-c8c034003cd0
  5c236e02-16ea-42b1-b935-3a6a768e3655
  22e09356-08ce-4b2c-a8fd-596d818b1e8a
  4cb894f7-c3ed-4b8d-86c6-0242200ea333

bcherny1mo ago

Thanks for the feedback IDs — read all 5 transcripts.

nayroclade1mo ago

2 more replies

diavelguru1mo ago

Love this. Responding to users. Detail info investigating. Action being taken (at least it seems so).

2 more replies

allisdust1mo ago

JamesSwift1mo ago

Just set that flag and already getting similar poor results. new one: 93b9f545-716c-4335-b216-bf0c758dff7c

1 more reply

tomaskafka1mo ago

onoesworkacct1mo ago

This kind of thing is harder for regular end-users to understand following the change removing reasoning details.

mangatmodi1mo ago

I am curious. Are you able to see our session text based on the session ID? That was big no in some of the tier-1 places I worked. No employee could see user texts.

1 more reply

gilrain1mo ago

> The data points at adaptive thinking under-allocating reasoning on certain turns

Will you reopen the issue you incorrectly closed, then…? Or are you just playacting concern?

matheusmoreira1mo ago

b9cd0319-0cc7-4548-bd8a-3219ede3393a

> You're right to push back. Let me be honest about both questions.

> The @() implementation is ad-hoc

> The current implementation manually emits synthetic tokens — tag, start-attributes, attribute, end-attributes, text, end-interpolation — in sequence.

I just feel like I can't trust it anymore.

koverstreetOP1mo ago

That's pretty much been my day - today was genuinely bad, and I've been putting up with a lot of this lately.

Now on Qwen3.5-27b, and it may not be quite as sharp as Opus was two months ago, but we're getting work done again.

matheusmoreira1mo ago

2 more replies

koverstreetOP1mo ago

I'll have a look. The CoT switch you mentioned will help, I'll take a look at that too, but my suspicion is that this isn't a CoT issue - it's a model preference issue.

necrotic_comp1mo ago

Opus definitely pushes me to ignore problems. I've had to tell it multiple times to be thorough, and we tend to go back and forth a few times every time that happens. :)

pimeys1mo ago

"I see the tests failing, but none of our changes caused this breakage so I will push my changes and ask the user to inform their team on failing tests."

jchanimal1mo ago

freedomben1mo ago

How much of the code/context gets attached in the /bug report?

bcherny1mo ago

When you submit a /bug we get a way to see the contents of the conversation. We don't see anything else in your codebase.

murkt1mo ago

Was there a change in Claude Code system prompt at that time that nudges Claude into simplistic thinking?

Here is a gist that tries to patch the system prompt to make Claude behave better https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

If the system prompt indeed prefers laziness in 5:1 ratio, that explains a lot.

I will submit /bug in a few next conversations, when it occurs next.

7 more replies

andoando1mo ago

Isnt the codebase in the context window?

1 more reply

stefan_1mo ago

taylorfinley1mo ago

I've seen this frequently also

withinboredom1mo ago

I suspect it happens when the model's adaptive thinking was too conservative and it could have thought more, but didn't.

butlike1mo ago

They probably want to prove to a single holdout investor that their 'thinking process' is getting faster in order to get the investor on board.

j / k navigate · click thread line to collapse