The "draft freely, promote on approval" method is the only thing I think works. Anything else is open to way too many forms of context poison. And you're either buried in writing safeguards, adding review layers, or you're praying you don't hit edge cases.
You don't have to trust the capture layer. Put a reviewer agent on top with memory of what's been approved and rejected, keep a human in the loop on the close calls. Over time the reviewer gets calibrated and the human review queue shrinks.