3TournO: Tournament Optimization for Non-Verifiable RL (opens in new tab)(github.com)3leonardtang1mo ago0
4j1-micro and j1-nano: Tiny (0.6B, 1.7B) and Mighty Reward Models (opens in new tab)(github.com)3leonardtang11mo ago0
5Verdict: A Library for Scaling Judge-Time Compute (opens in new tab)(twitter.com)3leonardtang1y ago0
8Cascade: A fast, automated, multi-turn LLM jailbreaking method (opens in new tab)(twitter.com)2leonardtang1y ago0
13Sphynx: Fuzz Testing Hallucination Detection Models (opens in new tab)(github.com)2leonardtang1y ago0
15Thorn in a HaizeStack test for evaluating long-context adversarial robustness (opens in new tab)(github.com)19leonardtang2y ago11