1Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult (opens in new tab)(simonwillison.net)6jonesn113mo ago1