(Source: I am a two-time IMO silver medalist.)
If verifying a good idea is easy, then the evidence shows that the AI didn't have good ideas for the other 2 problems.
Humans are even better at this as you mention - but effectively the approach is similar. Come up with lot of ideas and see what proves it.
Eg. if it asked you to do something computationally hard (when done in a brute force way: calculators not allowed), and you put out the answer without the process using formal math knowledge, you'd get zero points. Even if it was easy to prove the answer was correct.
That's why IMO and many math exams won't take a single answer even if correct.