kumama on Hacker News

1

Designing dev onboarding for an agent-first world (opens in new tab)

(castform.com)

2kumama3d ago0

2

I post-trained a model to reliably roll a die (opens in new tab)

(castform.com)

2kumama10d ago0

3

Open-Weight Models Don't Need to Win (opens in new tab)

(twitter.com)

5kumama1mo ago8

4

Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads (opens in new tab)

(castform.com)

4kumama1mo ago0

5

Pokegents: Making multi-agent coding feel like a team (opens in new tab)

(castform.com)

8kumama1mo ago1

6

Grpo explained: group relative policy optimization for LLM finetuning (opens in new tab)

(cgft.io)

1kumama2mo ago0

7

Do RL on a model with your vector db (opens in new tab)

(cgft.io)

1kumama2mo ago0

8

What is reinforcement learning finetuning (opens in new tab)

(youtube.com)Video

3kumama2mo ago0

9

RAG to riches: synthetic data for training RAG agents (opens in new tab)

(cgft.io)

2kumama3mo ago0

10

rag not lag: rl for fast agentic retrieval (opens in new tab)

(cgft.io)

3kumama3mo ago0

11

Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning (opens in new tab)

(github.com)GitHub

1kumama11mo ago0

12

Beating o3/o4-mini with Codebase-specific Reinforcement Learning (opens in new tab)

(cgft.io)

3kumama1y ago0

13

We might be overestimating coding agent performance on SWE-Bench (opens in new tab)

(cgft.io)

1kumama1y ago1

14

How to Improve Code Completion LLMs with Repo-Specific Finetuning (opens in new tab)

(cgft.io)

3kumama1y ago1

15

Show HN: Free AI Code Completion for Xcode with model choice/codebase context (opens in new tab)

(cgft.io)

2kumama1y ago0

kumama

Recent submissions

Designing dev onboarding for an agent-first world (opens in new tab)

I post-trained a model to reliably roll a die (opens in new tab)

Open-Weight Models Don't Need to Win (opens in new tab)

Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads (opens in new tab)

Pokegents: Making multi-agent coding feel like a team (opens in new tab)

Grpo explained: group relative policy optimization for LLM finetuning (opens in new tab)

Do RL on a model with your vector db (opens in new tab)

What is reinforcement learning finetuning (opens in new tab)

RAG to riches: synthetic data for training RAG agents (opens in new tab)

rag not lag: rl for fast agentic retrieval (opens in new tab)

Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning (opens in new tab)

Beating o3/o4-mini with Codebase-specific Reinforcement Learning (opens in new tab)

We might be overestimating coding agent performance on SWE-Bench (opens in new tab)

How to Improve Code Completion LLMs with Repo-Specific Finetuning (opens in new tab)

Show HN: Free AI Code Completion for Xcode with model choice/codebase context (opens in new tab)

Recent submissions

Designing dev onboarding for an agent-first world (opens in new tab)

I post-trained a model to reliably roll a die (opens in new tab)

Open-Weight Models Don't Need to Win (opens in new tab)

Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads (opens in new tab)

Pokegents: Making multi-agent coding feel like a team (opens in new tab)

Grpo explained: group relative policy optimization for LLM finetuning (opens in new tab)

Do RL on a model with your vector db (opens in new tab)

What is reinforcement learning finetuning (opens in new tab)

RAG to riches: synthetic data for training RAG agents (opens in new tab)

rag not lag: rl for fast agentic retrieval (opens in new tab)

Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning (opens in new tab)

Beating o3/o4-mini with Codebase-specific Reinforcement Learning (opens in new tab)

We might be overestimating coding agent performance on SWE-Bench (opens in new tab)

How to Improve Code Completion LLMs with Repo-Specific Finetuning (opens in new tab)

Show HN: Free AI Code Completion for Xcode with model choice/codebase context (opens in new tab)