undefined | Better HN

0 pointsXCSme4mo ago0 comments

I made my own benchmarks, very basic questions, and Claude 4.6 is actually worse than the free Stepfun 3.5 version: https://aibenchy.com

It is smart, but it fails at basic instruction following sometimes.

I remember this is a Claude thing for quite a while, where I kept trying to make it output just JSON (without structured output), and it always kept adding quotes or new lines.

0 comments

1 comments · 1 top-level

XCSmeOP4mo ago

After looking more into it, Claude DOES give the correct answer, just not in the format that it's asked, it always adds more info at the end, even when asked to just give the answer...

1 more reply

j / k navigate · click thread line to collapse