Yes, but these three types of progress are worth distinguishing:
1. Mt. Everest is summited for the first time.
2. An easier or more direct route to the Everest summit is discovered.
3. Someone finds that if a more experienced climber is already at the summit and drops down a series of rope ladders and oxygen tanks and cabins at key points, then it is even easier to make the summit because you can now pack lighter.
All three are interesting, worth discussing etc. But it would be a bit of a stretch to conclude from the third one that “less is more” because you don’t need to bring so much gear when someone else brings it for you.
For example, Attention is All You Need had a similar title. But the whole point was that they did not use recurrent networks at any stage in the learning process.
My point is not to discredit this result but to frame it properly: reasoning models like R1/O1 are incredibly efficient at distilling knowledge to smaller non-reasoning models.