Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
gunalx
6mo ago
0 comments
Save
Share
Well problematic because they are using their own verifier as apanem of experts, with their own model trained specifically to satisfy this verifier. On the benchmark runs, they dont mention using human experts to cross validate their scores.
0 comments
1 comments · 1 top-level
top
newest
oldest
cubefox
6mo ago
I assume they use self-verification only during RL training to provide the reward signal, but not for benchmarks.
j
/
k
navigate · click thread line to collapse