1Show HN: FKS2G – LLM-backed metrics for deciding how closely to review code (opens in new tab)(github.com)GitHub2kmdupree1mo ago0Save
2Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error (opens in new tab)(philosophicalhacker.com)4kmdupree2mo ago0Save
3Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error (opens in new tab)(philosophicalhacker.com)3kmdupree2mo ago0Save
4SWE-bench Verified no longer measures frontier coding capabilities (opens in new tab)(openai.com)343kmdupree2mo ago181Save
5The Half-Life of a Moat (Part 1) (opens in new tab)(semistructured.substack.com)1kmdupree2mo ago0Save
6Thoughts about Moments in Claude Mythos System Card (opens in new tab)(old.reddit.com)3kmdupree2mo ago0Save
7EsoBench: Learning a Novel Esolang via Iterative Execution Feedback (opens in new tab)(caseys-evals.com)1kmdupree2mo ago0Save
10Scientists just developed a new AI modeled on the human brain (opens in new tab)(livescience.com)4kmdupree10mo ago0Save
11LLMs and the Russellian Inversion (opens in new tab)(philosophicalhacker.com)2kmdupree10mo ago0Save
13Atlassian migrated 4M Postgres databases to shrink AWS bill (opens in new tab)(theregister.com)8kmdupree11mo ago0Save
14Libraries are under-used. LLMs make this problem worse (opens in new tab)(makefizz.buzz)62kmdupree1y ago52Save
15Lessons from letting AI vibe code a landing page (opens in new tab)(martech.org)2kmdupree1y ago0Save