1Did Google's AI agents build an operating system for $916? (opens in new tab)(normaltech.ai)4randomwalker1mo ago0Save
2Open-world evaluations for measuring frontier AI capabilities [pdf] (opens in new tab)(cruxevals.com)PDF2randomwalker2mo ago0Save
3Towards a science of AI agent reliability (opens in new tab)(normaltech.ai)1randomwalker4mo ago0Save
4When AI Builds AI – Findings from a Workshop on Automation of AI R&D [pdf] (opens in new tab)(cset.georgetown.edu)PDF1randomwalker4mo ago0Save
5The Longitudinal Expert AI Panel: Understanding Expert Views on AI [pdf] (opens in new tab)(static1.squarespace.com)PDF1randomwalker7mo ago0Save
6Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation (opens in new tab)(arxiv.org)arXiv1randomwalker8mo ago0Save
8Could AI slow science? Confronting the production-progress paradox (opens in new tab)(aisnakeoil.com)2randomwalker11mo ago0Save
10Why an overreliance on AI-driven modelling is bad for science (opens in new tab)(nature.com)1randomwalker1y ago0Save
12We Looked at 78 Election Deepfakes. Political Misinformation Isn't an AI Problem (opens in new tab)(knightcolumbia.org)5randomwalker1y ago0Save
13Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers (opens in new tab)(arxiv.org)arXiv3randomwalker1y ago0Save
14Is the UK's liver transplant matching algorithm biased against younger patients? (opens in new tab)(aisnakeoil.com)93randomwalker1y ago62Save
15Core-Bench: Computational Reproducibility Agent Benchmark (opens in new tab)(arxiv.org)arXiv1randomwalker1y ago0Save