1Absolute Zero: Reinforced Self-Play Reasoning with Zero Data (opens in new tab)(arxiv.org)88leodriesch0y ago19
2Does RL Incentivize Reasoning in LLMs Beyond the Base Model? (opens in new tab)(limit-of-rlvr.github.io)84leodriesch1y ago38
4Grok, an AI Modeled After the Hitchhiker's Guide to the Galaxy (opens in new tab)(twitter.com)5leodriesch2y ago2