(A physical concept could be something as simple as how to catch a frisbee, or, alternatively, imagine a cat trying to predict how best to swipe at a fleeing mouse. If the mouse zigs when it could have zagged, the cat, for all its well-honed instincts, may miss. It may have predicted wrongly.)
Predicting tokens is really quite similar. I really think that it's the same type of thing.
Getting facts right is a matter of error correction and knowledgebase utilization, which is why "reasoning models" with error correction layers and RAG are so good.