Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
XCSme
17d ago
0 comments
Share
They mentioned in their release page, that the Claude team noticed memorization of the SWE-bench test, so the test is actually in the training data.
Here:
https://www.anthropic.com/news/claude-opus-4-7#:~:text=memor...
0 comments
default
newest
oldest
sigmoid10
16d ago
Any static benchmark older than 12-18 months is basically worthless, because the content will have spread all over the internet and have found its way into the latest model's training set.
William_BB
16d ago
Good luck arguing with SWE benchmark purists
j
/
k
navigate · click thread line to collapse