This reminds me of that example where someone asked an agent to improve a codebase in a loop overnight and they woke up to 100,000 lines of garbage [0]. Similarly you see people doing side-by-side of their implementation and what an AI did, which can also quite effectively show how AI can make quite poor architecture decisions.
This is why I think the “plan modes” and spec driven development are so important effective for agents, because it helps to avoid one of their main weaknesses.