1) Tell claude my idea with as much as I know, ask it to ask me questions. This could go on for a few rounds. (Opus)
2) Run a validate skill on the plan, reviewer with a different prompt (Opus)
3) codex reviews the plan, always finds a few small items after the above 2.
4) claude opus implements in 1 shot, usually 99% accurate, then I manually test.
If I stay on target with those steps I always have good outcomes, but it is time consuming.
In my case I have Gemini CLI, so I tell Gemini to use the little python script called gatekeeper.py to validate it's plan before each phase with Qwen, Kimi, or (if nothing else is getting good results) ChatGPT 5.2 Thinking. Qwen & Kimi are via fireworks.ai so it's much cheaper than ChatGPT. The agent is not allowed to start work until one of the "experts" approves it via gatekeeper. Similarly it can't mark a phase as complete until the gatekeeper approves the code as bug free and up to standards and passes all unit tests & linting.
Lately Kimi is good enough, but when it's really stuck it will sometimes bother ChatGPT. Seldom does it get all the way to the bottom of the pile and need my input. Usually it's when my instructions turned out to be vague.
I also have it use those larger thinking models for "expert consultation" when it's spent more than 100 turns on any problem and hasn't made progress by it's own estimation.