undefined | Better HN

0 pointsdigdugdirk1d ago0 comments

How does that get integrated into the scoring system? I'm imagining a scenario where a cheaper model may get close, but only needs a small follow up to get the desired result. How would this score in comparison to a larger model that got it right the first time - even if it may have been much more expensive overall?

0 comments

languid-photic1d ago

We also use a secondary signal from blinded multi-verifier reviews. Each verifier ranks the candidates, and those verification outcomes serves as an additional quality signal. It's somewhat similar to consensus labeling.

Btw, this also helps manage scale. Eg you have 15 diffs to review. Run a few verifiers to get a short list, then review directly and apply the best.

j / k navigate · click thread line to collapse

0 comments

languid-photic1d ago

Btw, this also helps manage scale. Eg you have 15 diffs to review. Run a few verifiers to get a short list, then review directly and apply the best.

j / k navigate · click thread line to collapse