2A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE) (opens in new tab)(github.com)1starzmustdie3mo ago0
3Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning (opens in new tab)(github.com)1starzmustdie11mo ago0
4Show HN: Word Game Bench – evaluating language models on word puzzles (opens in new tab)(wordgamebench.github.io)1starzmustdie1y ago0
5Show HN: Answers to Chip Huyen's ML Interview Questions (opens in new tab)(github.com)3starzmustdie2y ago0