If I instruct the AI to make small modules where I can verify they work, have tests and no side effects - then it is good enough code for me. It works, is readable and can be extended - and will turn into bad code if this is not done with care.
The harder the code is to understand, the badder it is (and the more likely it is infested with bugs).
Some of these checks have caught thousands of the same error, even with the latest Opus 4.7 writing the original code.
You can’t play whack-a-mole with GenAI. You have to start from well-known principles and watch everything it produces. Every module or bounded context has to have its own invariants.
You can’t fully automate software engineering with GenAI. It seems the vast majority of GenAI users think they can and end up in the same place as the OP.
Maybe learn Domain-Driven Design, Event Sourcing, and then try again. The results will be dramatically improved.
I find the more good practices I add to my envision/scope/spec/build/test/deploy loops the happier I am with the outcomes.
I will say that I am finding the actual code to be somewhat ephemeral for me - the more precise the specifications are and generally the tighter and more elegant the design is, the less the code matters as a long term artifact.
I'm not at the "code is assembler" point yet - but I could see that with more, richer specs I could end up there. Of course the specs are then substantial, but declarative specs can be robust and unambigous (with sufficient read teaming review) and - like domain specific languages - reduce the accidental complexity of the syntax when compared to an implementation in a given language.
There are exceptions to all of this, but it's fascinating to see how it's evolving!