It 'knows' language, as in it has learnt about relationships between words (and thats really underselling it, in reality it has learnt very very subtle relationships between a great many words, and it can process words about 2000 at a time (token count etc))
BUT as you say it has no outside reference, its just a bundle of weights (those weights forming models of a sort)
BUT we provide the outside context by interacting with it. We ask it a question, it is able to provide an answer.
In any case it wont be long before someone hooks one of these up to cameras and robot arms and teaches it to make a cup of tea or whatever. A 'relationship to reality' is coming in the next few years if you think thats a critical ingredient.