undefined | Better HN

0 pointscubefox2y ago0 comments

I don't think this commitment had any plausibility. Token "probabilities" only have a straightforward probabilistic interpretation for base models. In fine-tuned models, they do no longer represent the probability of the next token given the prompt, but rather how well the next token fulfills the ... tendencies induced by SL and RL tuning. Which is presumably pretty useless information. OpenAI has no intention to provide access to the GPT-4 base model, and they in fact removed API access to the GPT-3.5 base model.

0 comments

3 comments · 1 top-level

refulgentis2y ago· 2 in thread

Topic laundering, the probabilities are the probabilities, you don't suddenly get wrong probabilities with more training on more data

goodside2y ago

You do, because it’s not just more training it’s PPO updates instead of MLE. It’s no longer trying to estimate the token distribution of the training corpus, it’s trying to shift logprobs into tokens that maximize expected reward from the RM. The GPT-4 technical report has a figure showing that logprobs become less well calibrated as confidence scores in the RLHF vs pre-train model.

refulgentis2y ago

Fascinating, ty

j / k navigate · click thread line to collapse