They could. They would easily be found out as they loose in real world usage or improved new unique benchmarks.
If you were in charge of a large and well funded model, would you rather pay people to find and "cheat" on LLM benchmarks by training on them, or would you pay people to identify benchmarks and make reasonably sure they specifically get excluded from training data?
I would exclude them as well as possible so I get feedback on how "real" any model improvement is. I need to develop real world improvements in the end, and any short term gain in usage by cheating in benchmarks seems very foolish.