That's already happening, and is in fact even part of the R1 training pipeline. An intermediate small reasoning model churns out training data for RL a larger model, rinse and repeat. Deepseek also showed model distillation with synthetic reasoning data to work quite well.
It’s a pretty neat paradigm and I see an abstract connection to how brains dream and produce their own synthetic training data while sleeping that supplements their real data used while awake.