Yeah but this doesn't change how the model functions, this is just turning reasoning into training data by example. It's not learning how to reason - it's just learning how to pretend to reason, about a gradually wider and wider variety of topics.
If any LLM appears to be reasoning, that is evidence not of the intelligence of the model, but rather the lack of creativity of the question.