Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
grantpitt
4mo ago
0 comments
Share
do say more
0 comments
default
newest
oldest
GodelNumbering
4mo ago
Makes it sound like a one trick pony
jascha_eng
4mo ago
Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.
Mkengin
4mo ago
I am eagerly awaiting swe-rebench results for November with all the new models:
https://swe-rebench.com/
grantpitt
OP
4mo ago
well, it's a big trick
j
/
k
navigate · click thread line to collapse