undefined | Better HN

0 pointsadastra226mo ago0 comments

We don’t know how Claude code is internally implemented. I would not be surprised at all if they literally inject that string as an alternative context and then go with the higher probability output, or if RLHF was structured in that way and so it always generates the same text.

0 comments

data-ottawa6mo ago

Very likely RLHF, based only on how strongly aligned open models repeatedly reference a "policy" despite there being none in the system prompt.

I would assume that priming the model to add these tokens ends up with better autocomplete as mentioned above.

steveklabnik6mo ago

Claude Code is a big pile of minified Typescript, and some people have effectively de-compiled it.

sejje6mo ago

So how does it do it?

steveklabnik6mo ago

I haven't read this particular code, I did some analysis of various prompts it uses, I didn't hear about anything specific like this. Mostly wanted to say "it's at least possible to dig into it if you'd like," not that I had the answer directly.

1 more reply

j / k navigate · click thread line to collapse

0 pointsadastra226mo ago0 comments

0 comments

data-ottawa6mo ago

Very likely RLHF, based only on how strongly aligned open models repeatedly reference a "policy" despite there being none in the system prompt.

I would assume that priming the model to add these tokens ends up with better autocomplete as mentioned above.

steveklabnik6mo ago

Claude Code is a big pile of minified Typescript, and some people have effectively de-compiled it.

sejje6mo ago

So how does it do it?

steveklabnik6mo ago

1 more reply

j / k navigate · click thread line to collapse