> what is most likely to come next
> not what is most likely to reflect reality.
Shouldn't there be a strong statistical correlation between the two? And, isn't that, fundamentally, more about intent of the training? If I train a model to predict what comes next in reality, it's through next word prediction, but it is predicting what reflects reality the best.