1SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks (opens in new tab)(arxiv.org)2FiberBundle1mo ago0