undefined | Better HN

0 pointsWarmWash2mo ago0 comments

GLM 5.1, widely held up as the model at the heals, perhaps ever surpassing western models....

Gets 5% on ARC-AGI2 private set.

Chinese models are suspiciously good a benchmarks.

0 comments

1 comments · 1 top-level

I mean, I could say the same about Gemini. 3.1 Pro tops a bunch of benchmarks out there but any practical use I've put it to it's underperforming both other proprietary and open weight models. Benchmarks are suspicious in general.

j / k navigate · click thread line to collapse