Is it possible they use the same base pre-trained model and just fine-tuned and RL-ed it better (which, of course, is where all the secret sauce training magic is these days anyhow)? That would be odd, especially for a major version bump, but it's sort of what having the same training cutoff points to?