1Show HN: New Benchmark from SWE-bench team is 0% solved (opens in new tab)(programbench.com)23lieret4d ago3
2Show HN: All the LM solutions on SWE-bench are bloated compared to humans (opens in new tab)(twitter.com)1lieret2mo ago0
3Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets (opens in new tab)(codeclash.ai)5lieret6mo ago1
4Show HN: Randomly switching between LMs at every step boosts SWE-bench score (opens in new tab)(swebench.com)5lieret8mo ago1
5GPT-5 on SWE-bench: Cost and performance deep-dive (opens in new tab)(mini-swe-agent.com)4lieret9mo ago3
6Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (opens in new tab)(swebench.com)2lieret9mo ago0
7Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python (opens in new tab)(github.com)7lieret9mo ago4