It's starting to feel like a lot of comments on here and other social media outlets that are anecdotal about their experience with x model and y tool are astroturfing. They add almost zero value to the conversation.
These is a multi-billion dollar market and battleground, so im skeptical of anyone telling me that this isn't happening at a decent clip. I think moderators on the site should definitely consider how to approach this because it's devaluating this space as a place for actual discourse.
My mind also considers that this being one of Altman's old stomping grounds, he may place a higher value in winning here than elsewhere.
[0]I say December, because that's around the time the models got good enough that non-AI folks started to notice.
There's a live Claude status board in the corner so you know when it's time to get back to work.
I worked with 4.6 and found some improvements for better planning and sustained us, but agree some posters 4.7 is slower, overthinking.
What I expect is frontier models to get bigger and more expensive (especially fast mode like on Cerberus). And most of his get much smaller distillations for the more generous subscription tiers.
We can now shop around easily. They almost all do the same thing now. The models are "Just Enough".
[unknown] missing EndStreamResponse
Normally I'd just have it write out what it's doing to a file, if I need to transfer context, but if it goes down mid-session that's a no-go.
I think people have built tools for this, and of course you could reasonably vibe one yourself, but I don't really trust something like that to work reliably or in an ongoing manner.
Maybe it should just be a skill.
Still, it's pretty crazy that Claude is down to 1 nine.
It's impossible to tell these days whether 4.7 is stuck because it's thinking and Anthropic suppressed all output (seriously, 4.7 will just start making changes without explaining any reasoning - how is that an upgrade?) or because the underlying infrastructure is having issues.
4.5 -> 4.7 feels like going from working with a coach-able, junior engineer that does well with clear guidance to working with a cocky mid-level that will spend too long on pointless tangents and make confidently incorrect changes without any discussion.
Many such cases with humans (given that we continue to compare LLMs to humans these days which you cannot)