undefined | Better HN

0 pointshighfrequency1y ago0 comments

The abstract doesn’t specify that the 857 training examples were filtered down by R1 from 10 million initial questions. This helps to understand the result better: it is in large part a testament to R1 and similar models’ remarkable ability sift through and identify/construct perfect training data for other models.

0 comments

Eisenstein1y ago

Isn't every progression in technology a result of the previous advance in technology enabling it?

highfrequencyOP1y ago

Yes, but these three types of progress are worth distinguishing:

1. Mt. Everest is summited for the first time.

2. An easier or more direct route to the Everest summit is discovered.

3. Someone finds that if a more experienced climber is already at the summit and drops down a series of rope ladders and oxygen tanks and cabins at key points, then it is even easier to make the summit because you can now pack lighter.

All three are interesting, worth discussing etc. But it would be a bit of a stretch to conclude from the third one that “less is more” because you don’t need to bring so much gear when someone else brings it for you.

For example, Attention is All You Need had a similar title. But the whole point was that they did not use recurrent networks at any stage in the learning process.

My point is not to discredit this result but to frame it properly: reasoning models like R1/O1 are incredibly efficient at distilling knowledge to smaller non-reasoning models.

j / k navigate · click thread line to collapse