1ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases (opens in new tab)(arxiv.org)arXiv2BalinKing3mo ago0Save