It does not have an understanding, it pattern matches the "idea shape" of words in the "idea space" of training data and calculates the "idea shape" that is likely to follow considering all the "idea shape" patterns in its training data.
It mimics understanding. It feels mysterious to us because we cannot imagine the mapping of a corpus of text to this "idea space".
It is quite similar to how mysterious a computer playing a movie can appear, if you are not aware of mapping of movie to a set of pictures, pictures to pixels, and pixels to co-ordinates and colors codes.