How do you know whether the tests it’s spits out are bad if you don’t read the tests.
We’re not dealing AGI here. Tests aren’t strictly necessary for humans. They are for AI. AI requires guardrails to keep from spinning out. That’s essentially the entire premise of the agentic workflow.
I’m pretty sure they just meant they do testing not that they read the tests and that’s what everyone else who responded interpreted that as well.
You can get Claude to write good tests but based on what I’m seeing at work that’s not what’s happening. They always look plausible even when they’re wrong, so people either don’t read them, skim them very quickly, or read the first few assume the rest work and commit.
I think Claude is great for testing because setting test data and infrastructure is such a boring slog. But it almost always takes a lot of back and forth and careful handholding to get it right.
I read the tests, it also is really really good to have Claude verify that removing the changes in question break the tests. This brings the quality way way up for me.