3Engine-Bench: Evaluating Coding Agents on Writing Game Engine Code (opens in new tab)(github.com)2JoshPurtell1mo ago0
5Engine-Bench: Benchmarking Coding Agents on TCG Game Engine Tasks (opens in new tab)(github.com)2JoshPurtell2mo ago0
6Verify long-horizon tasks with GEPA on the judge (opens in new tab)(usesynth.ai)4JoshPurtell3mo ago0