SWEBench-Verified is probably benchmaxxed at this stage. Claude isn't even the top performer, that honor goes to Doubao [1].
Also, the confidence interval for a such a small dataset is about 3 percent points, so these differences could just be up to chance.
[1] https://www.swebench.com/