1MirrorCode: Evidence that AI can do some weeks-long coding tasks (opens in new tab)(epoch.ai)3tadamcz29d ago0
2Claude 4 Sonnet hacked SWE-bench by peeking at future commits (opens in new tab)(bayes.net)3tadamcz8mo ago1
3Open database of AI benchmark results with raw evaluation logs (opens in new tab)(epoch.ai)1tadamcz1y ago1
5Show HN: Probly – a Python-like language for quick Monte Carlo simulation (opens in new tab)(usedagger.com)3tadamcz2y ago0