Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad (opens in new tab)

(arxiv.org)

6 pointsmauriziocalo1y ago1 comments

1 comments

1 comments · 1 top-level

> Our results reveal that all tested models struggled significantly, achieving less than 5% on average

j / k navigate · click thread line to collapse