1A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE) (opens in new tab)(github.com)1starzmustdie2mo ago0
2Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning (opens in new tab)(github.com)1starzmustdie10mo ago0
3Show HN: Word Game Bench – evaluating language models on word puzzles (opens in new tab)(wordgamebench.github.io)1starzmustdie1y ago0
4Show HN: Answers to Chip Huyen's ML Interview Questions (opens in new tab)(github.com)3starzmustdie2y ago0