undefined | Better HN

0 pointslibraryofbabel5mo ago0 comments

Yeah, there is definitely some RLVR training going on for the Claude LLMs to get them good at some of the specific tool calls used in Claude Code, I expect. Having said that, the string replacement tool schema for file edits is not very complicated at all (you can see it in the tool call schema Claude Code sends to the LLM), so you could easily use that in your own 200-300 line agent if you wanted to make sure you're playing to the LLM's strengths.

0 comments

3 comments · 1 top-level

aszen5mo ago· 2 in thread

Yeah that's one example, but I suspect they train the model on entire sequences of tool calls, so unless you prompt the model exactly as them you won't get the same results.

There's a reason they won the agent race, their models are trained to use their own tools.

libraryofbabelOP5mo ago

Agree, the RLVR tasks are probably long series of tool calls at this point doing complex tasks in some simulated dev environment.

That said, I think it's hard to say how much of a difference it really makes in terms of making Claude Code specifically better than other coding agents using the same LLM (versus just making the LLM better for all coding agents using roughly similar tools). There is probably some difference, but you'd need to run a lot of benchmarks to find out.

aszen5mo ago

Agreed it probably contributes to the model improving for all agents but crucially it is verifiably better against their own agent. So they get a good feedback loop to improve both

j / k navigate · click thread line to collapse