Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
DeepSWE Measuring frontier coding agents
(opens in new tab)
(deepswe.datacurve.ai)
2 points
e2e4
25d ago
1 comments
Save
Share
1 comments
1 comments · 1 top-level
top
newest
oldest
e2e4
OP
25d ago
gpt-5.5xhigh leading benchmark, coincides with my recent experience. I've been opus 4.7 user but it burns tokens so quickly, so gave gpt-5.5xhigh (via codex) a try, quality was similar (if not better), and tokens lasted a lot longer.
j
/
k
navigate · click thread line to collapse