Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
GPT-5.2, Grok 4.1, and DeepSeek v3.2 compare as Santa agents
(opens in new tab)
(veris.ai)
4 points
_josh_meyer_
3mo ago
2 comments
Share
GPT-5.2, Grok 4.1, and DeepSeek v3.2 compare as Santa agents | Better HN
2 comments
default
newest
oldest
_josh_meyer_
OP
3mo ago
SantaBench, a fun benchmark with a serious methodology. The task: play a cheeky Santa agent who researches users online and roasts them based on their social media.
_josh_meyer_
OP
3mo ago
OP here -- I work at Veris and built this. Happy to answer questions about the methodology!
j
/
k
navigate · click thread line to collapse