You raise a good point in that I could've created a training set and a test set, that probably would be a better validation. But I don't know, I'm not doing science, I'm making a game.
On the topic of whether the future matches the past, the predictions were based on a rolling database of the past 100000 matches, which is approximately the number of matches played per 7 days. So my theory is that the data is quite recent and up-to-date and so should match, in general.
Of course I never tested this. In the end, I'm not doing science, I'm making a game. If the retention goes up, complaints are down, then I can't keep working on the rating system, there are 1000 other things to do.