This would make sense to me as an explanation when it only outputs code. (And I think it explains why code often ends up subtly mangled when moved in a refactoring, where a human would copy paste, the agent instead has to ”retype” it and often ends up slightly changing formatting, comments, identifiers, etc.)
But for the most part, it’s spending more tokens on analysis and planning than pure code output, and that’s where these problems need to be caught.