Nothing in the article suggests it did not autonomously do the work.
> Why would anyone do that?
Because a lot of naysayers here pretend as if this is somehow trivial.
> My point was that why does the company _not_ make a useful tool?
Useful to whom? This is a researcher testing the limits of the models. Knowing those limits is highly useful to Anthropic. And it's highly useful to lots of others too, like me, as a means of understanding the capabilities of these models.
What, exactly would such a tool that'd somehow make the people dismissing this change their minds look like? Because I don't think anything would. They could produce lots of useful tools, if they aimed lower than testing the limits of the model. But it would not achieve what they set out to do, and it would not tell us anything useful.
I produce "useful tools" with Claude every day. That's not interesting. Anyone who actually uses these tools properly will develop a good understanding of the many things that can be achieved with them.
Most of us can't spend $20k figuring out where the limits are, however.
> I feel like that is a much more interesting topic of discussion than “why aren’t people that aren’t impressed by this spending their time trying to make this company look good?”
This is a ridiculous misrepresentation of the point. The point is that the people who aren't impressed by this very clearly and obviously do not have an understanding of the complexity of what they achieved, and are making ignorant statements about it.
> Aside from the notion that they maybe intentionally set out to create the least useful or valuable output from their tooling (eg ‘the floor’)
Again, you're either entirely failing to understand, or wilfully misrepresenting what I said. No, their goal was not to "set out the create the least useful or valuable output". Their goal was to test the limits of what the model can achieve. They did that.
That has far higher value than not testing the limits. Lots, and lots of people are building tools with Claude without testing the limits. We would not learn anything from that.
> my question was “Why do they not make something genuinely useful?”
Because that wasn't the purpose. The purpose was to test the limits of what the model can achieve. That you struggle to understand why what they achieved was massively impressive, does not change that.