1The ultimate guide to RL environments: building and scaling them in the LLM era (opens in new tab)(huggingface.co)7kashifr5d ago0
3Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries (opens in new tab)(huggingface.co)2kashifr1mo ago0
5The Smol Training Playbook: The Secrets to Building World-Class LLMs (opens in new tab)(huggingface.co)265kashifr6mo ago19
6Unlocking On-Policy Distillation for Any Model Family (opens in new tab)(huggingface.co)6kashifr6mo ago1
8Smollm3: Smol, multilingual, long-context reasoner LLM (opens in new tab)(huggingface.co)388kashifr10mo ago79
10AIMO (AI Math Olympiad) progress prize winning solution (opens in new tab)(huggingface.co)9kashifr1y ago0
11MaPO: A reference-free alignment technique for diffusion models (opens in new tab)(mapo-t2i.github.io)2kashifr1y ago1
12OpenHermesPreferences: Dataset of ~1M AI preferences from teknium/OpenHermes-2.5 (opens in new tab)(huggingface.co)7kashifr2y ago1