undefined | Better HN

0 pointstarruda1y ago0 comments

I remember that Llama 3 was trained on data curated by Llama 2 and it resulted in a model with a significant performance boost (even though it was trained by a previous generation model of the same size).

Maybe using a strong reasoning model such as R1 the next generation, even more performance can be extracted from smaller models.

0 comments

danielbln1y ago

That's already happening, and is in fact even part of the R1 training pipeline. An intermediate small reasoning model churns out training data for RL a larger model, rinse and repeat. Deepseek also showed model distillation with synthetic reasoning data to work quite well.

alchemist1e91y ago

It’s a pretty neat paradigm and I see an abstract connection to how brains dream and produce their own synthetic training data while sleeping that supplements their real data used while awake.

Davidzheng1y ago

Is your first claim in the R1 paper? I didn't see it when I looked

j / k navigate · click thread line to collapse