Yes, and I think there's a fundamental problem here. The big reason the "AI thought leadership" claim that AI should do well at coding is because there are mechanical success metrics like tests. Except that's not true. The tests cover the behaviour, not the structure. It's like constructing a building where the only tests are whether floorplans match the design. It makes catastrophic strctural issues easy to hide. The building
looks right, and it might even withstand some load, but later, when you want to make changes, you move a cupboard or a curtain rod only to have the structure collapse because that element ended up being load-bearing.
It's funny, but one of the lessons I've learnt working with agents is just how much design matters in software and isn't just a matter of craftsmenship pride. When you see the codebase implode after the tenth new feature and realise it has to be scrapped because neither human nor AI can salvage it, the importance of design becomes palpable. Before agents it was hard to see because few people write code like that (just as no one would think to make a curtain rod load-bearing when building a structure).
And let's not forget that the models hallucinate. Just now I was discussing architecture with Codex, and what it says sounds plausible, but it's wrong in subtle and important ways.