A lot of distillation happens. E.g. OLMo models have a completely open dataset and they are heavily distilled. It only makes sense to try to absorb behaviors from the best models out there. That said, I think the open weight juggernaughts are doing really genuinely great work with RL, training environments, architectural innovations etc.