The "most likely response" to text you wrote is: more text you wrote. Anytime the model provides an output you yourself wouldn't write, it isn't "the most likely response".
I believe that ChatGPT works by inserting some ANSWER_TOKEN, that is a prompt like "Tell me about cats" would probably produce "Tell me about cats because I like them a lot", but the interface wraps you prompt like "QUESTOION_TOKENL:Tell me about cats ANSWER_TOKEN:"
text-davinci-003 has no trouble working as a chat bot: https://i.imgur.com/lCUcdm9.png (note that the poem lines it gave me should've been green, I don't know why they lost their highlight color)
That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.