1Eval awareness in Claude Opus 4.6’s BrowseComp performance (opens in new tab)(anthropic.com)1upmind3mo ago0Save
4Jane Street Accused of Insider Trading That Helped Collapse Terraform (opens in new tab)(wsj.com)8upmind3mo ago2Save
5Ask HN: Do you think China will produce a SOTA model in the next 2 yearsRecent models like Kimi, Qwen, GLM, Deepseek etc seem to do well in benchmarks but not when actually using them in practise. Do you think they'll be an actual SOTA model by them in the next 2 years? Why/why not?NOTE: referring to text modelsshow more4upmind4mo ago2Save