I know that it isn't. That's part of the problem. There is no attempt to generate some sort of structure that can be interpreted semantically and reasoned about by the model. The model just operates on the input superficially and statistically. That's why there has been virtually no progress on trivial tasks such as answering:
"I took the water bottle out of the backpack so that it would be [lighter/handy]"
What is lighter and what is handy? No amount of stochastic language manipulation gets you the answer, you need to understand some rudimentary physics to answer the question, and as a precondition, you need a grammar or ontology.