https://github.com/anthropics/claude-code/issues?q=is%3Aissu...
Apparently whatever SWE-bench is measuring isn't very relevant.
Maybe that's why they haven't released it - to give it a vacation?