Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
osti
4mo ago
0 comments
Save
Share
Somehow regresses on SWE bench?
0 comments
4 comments · 3 top-level
top
newest
oldest
usaar333
4mo ago
· 1 in thread
i'd interpret that as rounding error. that is unchanged
swe-bench seems really hard once you are above 80%
Squarex
4mo ago
it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative
1 more reply
lkbm
4mo ago
I don't know how these benchmarks work (do you do a hundred runs? A thousand runs?), but 0.1% seems like noise.
SubiculumCode
4mo ago
That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.
j
/
k
navigate · click thread line to collapse