undefined | Better HN

0 pointsphreeza4mo ago0 comments

But this is missing exactly the gap which OP seems to have, which is going from a next token predictor (a language model in the classical sense) to an instruction finetuned, RLHF-ed and "harnessed" tool?

0 comments

1 comments · 1 top-level

js84mo ago

The book has a sequel https://www.manning.com/books/build-a-reasoning-model-from-s...

It will give you an answer to the extent anybody can.

j / k navigate · click thread line to collapse