Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
cold_harbor
1d ago
0 comments
Save
Share
GRPO skips the value network that makes PPO expensive — it scores candidates relative to each other within a group. that's what makes verifiable-reward training practical at 3B scale
0 comments
1 comments · 1 top-level
top
newest
oldest
mdp2021
7h ago
Why was this downvoted.
j
/
k
navigate · click thread line to collapse