Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
(opens in new tab)
(arxiv.org)
6 points
mauriziocalo
1y ago
1 comments
Save
Share
1 comments
1 comments · 1 top-level
top
newest
oldest
galaxyLogic
1y ago
> Our results reveal that all tested models struggled significantly, achieving less than 5% on average
j
/
k
navigate · click thread line to collapse