Turns out good usage of "language" requires a model of the world in which that language exists. "The purple, two eyed, green, five eyed, invisible frog said moo" is a grammatically fine sentence. But logically it makes no sense, does it have two eyes or five? Is it green or purple or invisible? Frogs don't typically say moo. To have actual coherent usage of language, you need a model of the world. Not just the world, but the current domain you're using language in. "The frog brainwashed the crowd with its psychic powers" is nonsense in a biology paper, but perfectly valid inside of the cartoon Futurama.
In ChatGPT the language-model and world-model are really just the same model, which makes a lot of sense.