This is not compatible with many people's experiences. I use Python with a type checker. I tell Claude that the task is only complete once the type checker passes cleanly. It doesn't stop until there are no type errors. This should be even easier in a compiled language, especially if you also tell it to run the tests.
In fact, I find that with a strict feedback loop set up (i.e. a lot of lint rules, a strict type checker and fast unit tests), it will almost always generate what I want.
As someone else said, each step might be pretty stupid, but if you have a fast iteration loop, it can run until everything passes cleanly. My recommendation is to specify what really counts as "done" in your AGENTS.md/CLAUDE.md.