Overall, these findings suggest that agent-written
tests often behave more like a habitual software-development rou-
tine than a dependable source of validation in this setting. More
agent-written tests do not mean more solves; what they more reli-
ably change is the process footprint—API calls, token usage, and
interaction patterns. Improving the value of testing for code agents
may therefore require better oracles and more actionable validation
signals, rather than simply inducing agents to write more tests.
> IMO, where tests clearly help is primarily as an "oracle" applied after generationBingo. I'm not against writing tests it's that the returns are better when its used as verification feedback and as "Oracle" exactly as you put it.