Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
(opens in new tab)
(arxiv.org)
6 points
mauriziocalo
11mo ago
1 comments
Share
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | Better HN
1 comments
default
newest
oldest
galaxyLogic
11mo ago
> Our results reveal that all tested models struggled significantly, achieving less than 5% on average
j
/
k
navigate · click thread line to collapse