I actually still don't see the source for them trying several times, but we can take that for granted. Regardless, as I said:
1. It's labeled as "moderately interesting"
2. They said that they expect an expert could solve it in 1-3 months
3. They had already come up with the solution that the AI had but weren't convinced it would have worked
So how big was the gap here, do you think?