Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
jkelleyrtp
4mo ago
0 comments
Save
Share
claude swe-bench is 80.8 and codex is 56.8
Seems like 4.6 is still all-around better?
0 comments
2 comments · 2 top-level
top
newest
oldest
gizmodo59
4mo ago
Its SWE bench pro not swe bench verified. The verified benchmark has stagnated
1 more reply
Rudybega
4mo ago
You're comparing two different benchmarks. Pro vs Verified.
j
/
k
navigate · click thread line to collapse