1ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (opens in new tab)(arxiv.org)3xdotli1mo ago1
3Native CLI scaffolds consistently outper-form OpenCode when using the same model (opens in new tab)(arxiv.org)1xdotli1mo ago1
5Automatically Learning Skills for Coding Agents (opens in new tab)(gepa-ai.github.io)4xdotli2mo ago0
6We Reached 74.8% on terminal-bench with Terminus-KIRA (opens in new tab)(krafton-ai.github.io)2xdotli2mo ago0
7Self-generated skills don't do much for AI agents, but human-curated skills do (opens in new tab)(theregister.com)2xdotli2mo ago3
8First Agent Skills Hackathon by the Authors of SkillsBench (opens in new tab)(skillathon.ai)2xdotli2mo ago1
10GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro (opens in new tab)(twitter.com)1xdotli4mo ago1