1Open-world evaluations for measuring frontier AI capabilities [pdf] (opens in new tab)(cruxevals.com)2randomwalker24d ago0
3When AI Builds AI – Findings from a Workshop on Automation of AI R&D [pdf] (opens in new tab)(cset.georgetown.edu)1randomwalker3mo ago0
4The Longitudinal Expert AI Panel: Understanding Expert Views on AI [pdf] (opens in new tab)(static1.squarespace.com)1randomwalker6mo ago0
5Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation (opens in new tab)(arxiv.org)1randomwalker6mo ago0
7Could AI slow science? Confronting the production-progress paradox (opens in new tab)(aisnakeoil.com)2randomwalker9mo ago0
9Why an overreliance on AI-driven modelling is bad for science (opens in new tab)(nature.com)1randomwalker1y ago0
11We Looked at 78 Election Deepfakes. Political Misinformation Isn't an AI Problem (opens in new tab)(knightcolumbia.org)5randomwalker1y ago0
12Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers (opens in new tab)(arxiv.org)3randomwalker1y ago0
13Is the UK's liver transplant matching algorithm biased against younger patients? (opens in new tab)(aisnakeoil.com)93randomwalker1y ago62
14Core-Bench: Computational Reproducibility Agent Benchmark (opens in new tab)(arxiv.org)1randomwalker1y ago0
15AI companies are pivoting from creating gods to building products (opens in new tab)(aisnakeoil.com)133randomwalker1y ago195