We also use a secondary signal from blinded multi-verifier reviews. Each verifier ranks the candidates, and those verification outcomes serves as an additional quality signal. It's somewhat similar to consensus labeling.
Btw, this also helps manage scale. Eg you have 15 diffs to review. Run a few verifiers to get a short list, then review directly and apply the best.