I don't think this commitment had any plausibility. Token "probabilities" only have a straightforward probabilistic interpretation for base models. In fine-tuned models, they do no longer represent the probability of the next token given the prompt, but rather how well the next token fulfills the ... tendencies induced by SL and RL tuning. Which is presumably pretty useless information. OpenAI has no intention to provide access to the GPT-4 base model, and they in fact removed API access to the GPT-3.5 base model.
You do, because it’s not just more training it’s PPO updates instead of MLE. It’s no longer trying to estimate the token distribution of the training corpus, it’s trying to shift logprobs into tokens that maximize expected reward from the RM. The GPT-4 technical report has a figure showing that logprobs become less well calibrated as confidence scores in the RLHF vs pre-train model.