kcorbitt on Hacker News

1

Codex, File My Taxes. Make No Mistakes (opens in new tab)

(corbt.com)

5kcorbitt3mo ago1

2

A Pocket Guide to Surviving the Robot Apocalypse (opens in new tab)

(corbt.com)

2kcorbitt4mo ago0

3

Show HN: RULER – Easily apply RL to any agent (opens in new tab)

(openpipe.ai)

81kcorbitt11mo ago11

4

Everything I know about reward hacking (opens in new tab)

(openpipe.ai)

3kcorbitt1y ago0

5

Show HN: ART – a new open-source RL framework for training agents (opens in new tab)

(github.com)GitHub

116kcorbitt1y ago12

6

ART·E: how we built an email research agent that beats o3 (opens in new tab)

(openpipe.ai)

3kcorbitt1y ago2

7

Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” (opens in new tab)

(openpipe.ai)

199kcorbitt1y ago55

8

Analyzing OpenAI's Reinforcement Fine-Tuning: Less Data, Better Results (opens in new tab)

(openpipe.ai)

4kcorbitt1y ago0

9

Using reinforcement learning and $4.80 of GPU time to find the best HN post (opens in new tab)

(openpipe.ai)

217kcorbitt1y ago95

10

Show HN: Agent.exe, a cross-platform app to let 3.5 Sonnet control your machine (opens in new tab)

(github.com)GitHub

406kcorbitt1y ago232

11

DPO fine-tuning outperforms SFT (opens in new tab)

(openpipe.ai)

1kcorbitt1y ago0

12

OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost (opens in new tab)

(openpipe.ai)

13kcorbitt2y ago2

kcorbitt

Recent submissions

Codex, File My Taxes. Make No Mistakes (opens in new tab)

A Pocket Guide to Surviving the Robot Apocalypse (opens in new tab)

Show HN: RULER – Easily apply RL to any agent (opens in new tab)

Everything I know about reward hacking (opens in new tab)

Show HN: ART – a new open-source RL framework for training agents (opens in new tab)

ART·E: how we built an email research agent that beats o3 (opens in new tab)

Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” (opens in new tab)

Analyzing OpenAI's Reinforcement Fine-Tuning: Less Data, Better Results (opens in new tab)

Using reinforcement learning and $4.80 of GPU time to find the best HN post (opens in new tab)

Show HN: Agent.exe, a cross-platform app to let 3.5 Sonnet control your machine (opens in new tab)

DPO fine-tuning outperforms SFT (opens in new tab)

OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost (opens in new tab)

Recent submissions

Codex, File My Taxes. Make No Mistakes (opens in new tab)

A Pocket Guide to Surviving the Robot Apocalypse (opens in new tab)

Show HN: RULER – Easily apply RL to any agent (opens in new tab)

Everything I know about reward hacking (opens in new tab)

Show HN: ART – a new open-source RL framework for training agents (opens in new tab)

ART·E: how we built an email research agent that beats o3 (opens in new tab)

Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” (opens in new tab)

Analyzing OpenAI's Reinforcement Fine-Tuning: Less Data, Better Results (opens in new tab)

Using reinforcement learning and $4.80 of GPU time to find the best HN post (opens in new tab)

Show HN: Agent.exe, a cross-platform app to let 3.5 Sonnet control your machine (opens in new tab)

DPO fine-tuning outperforms SFT (opens in new tab)

OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost (opens in new tab)