1Absolute Zero: Reinforced Self-Play Reasoning with Zero Data (opens in new tab)(arxiv.org)88leodriesch10mo ago19
2Does RL Incentivize Reasoning in LLMs Beyond the Base Model? (opens in new tab)(limit-of-rlvr.github.io)84leodriesch11mo ago38
4Grok, an AI Modeled After the Hitchhiker's Guide to the Galaxy (opens in new tab)(twitter.com)5leodriesch2y ago2