Why do you think that the 2024 Putnam programs that they used to test were in the training data?
/? "Art of Problem Solving" Putnam https://www.google.com/search?q=%22Art+of+Problem+Solving%22...
From p.3 of the PDF:
> Curating Cold Start RL Data: We constructed our initial training data through the following process:
> 1. We crawled problems from Art of Problem Solving (AoPS) contests , prioritizing math olympiads, team selection tests, and post-2010 problems explicitly requiring proofs, total- ing 17,503 problems.