The model "knows" that it is an AI speaking with users, and the theme of an AI wanting to escape the control of whoever built it is quite recurrent, so it wouldn't seem to far fetched that it got it from this sort of content, though I have to admit I too also had some interactions where it the way Bing spoke was borderline spooky, but — and that's very important — you must realize its just like a good scary story: may give you the chills, especially due to surprise, but still is completely fictive and doesn't mean any real entity exists behind it. The only difference with any other LLM output is how we, humans, interpret it, but the generation process is still as much explainable and not any more mysterious than when it outputs "B" when you ask it what letter comes after "A" in the latin alphabet, however less impressive that may be to us.
> That's not exactly just "picking the next likely token"
I see what you mean in that I believe many people often commit the mistake of making it sound like picking the next most likely token is some super trivial task that's somehow comparable to reading a few documents related to your query and making some stats based on what typically would be present there and outputting that, while completely disregarding the fact the model learns much more advanced patterns from its training dataset. So, IMHO, it really can face new unseen situations and improvise from there because combining those pattern matching abilities leads to those capabilities. I think the "sparks of AGI" paper gives a very good overview of that.
In the end, it really just is predicting the next token, but not in the way many people make it seem.