undefined | Better HN

0 pointsdanielbln1y ago0 comments

That's already happening, and is in fact even part of the R1 training pipeline. An intermediate small reasoning model churns out training data for RL a larger model, rinse and repeat. Deepseek also showed model distillation with synthetic reasoning data to work quite well.

0 comments

alchemist1e91y ago

It’s a pretty neat paradigm and I see an abstract connection to how brains dream and produce their own synthetic training data while sleeping that supplements their real data used while awake.

Davidzheng1y ago

Is your first claim in the R1 paper? I didn't see it when I looked

j / k navigate · click thread line to collapse

0 comments

alchemist1e91y ago

It’s a pretty neat paradigm and I see an abstract connection to how brains dream and produce their own synthetic training data while sleeping that supplements their real data used while awake.

Davidzheng1y ago

Is your first claim in the R1 paper? I didn't see it when I looked

j / k navigate · click thread line to collapse