2A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE) (opens in new tab)(github.com)GitHub1starzmustdie5mo ago0Save
3Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning (opens in new tab)(github.com)GitHub1starzmustdie1y ago0Save
4Show HN: Word Game Bench – evaluating language models on word puzzles (opens in new tab)(wordgamebench.github.io)1starzmustdie1y ago0Save
5Show HN: Answers to Chip Huyen's ML Interview Questions (opens in new tab)(github.com)GitHub3starzmustdie2y ago0Save