My experience is you can trust any code that is well tested, human or AI generated. And you cannot trust any code that is not well tested (what I call "vibe tested"). But some constraints need to be in natural language, and for that you need a LLM to review the PRs. This combination of code tests and LLM review should be able to ensure reliable AI coding. If it does not, iterate on your PR rules and on tests.
No comments yet.