I remember that Llama 3 was trained on data curated by Llama 2 and it resulted in a model with a significant performance boost (even though it was trained by a previous generation model of the same size).
Maybe using a strong reasoning model such as R1 the next generation, even more performance can be extracted from smaller models.