Skip to content

Top Best Ask Show New Jobs

Measuring What Matters: Construct Validity in Large Language Model Benchmarks (opens in new tab)

(oxrml.com)

3 pointsCynddl7mo ago2 comments

2 comments

2 comments · 2 top-level

ammaox7mo ago

A very large review of AI benchmarks that reveals a worrying trend in their effectiveness and scientific rigor

jruohonen7mo ago

Also Register picked it:

https://www.theregister.com/2025/11/07/measuring_ai_models_h...

j / k navigate · click thread line to collapse