About 30% of our AI PR review checks are flawed at a fundamental level, based on poor comprehension of the whole. If I told the AI "fix everything and keep going until it's green" I would be terrified of the result.
I disbelieve this works in anything other than a toy codebase (or an incredibly fine-grained microservice).
The 70% is amazing! But a 30% failure rate requires intense supervision.