It's a description of a behavior, not a mechanism. Which may or may not be appropriate depending on whether you are talking about *what* the model does or *how* it achieves it.
General understanding makes the information in the distribution very wide. Shallow understanding makes it very narrow. Like say recognizing only specific combinations of pixels verbatim.
"Generalization" is simply the theoretical measure of how much the later extends beyond the former, regardless of how that's achieved.
There's no distribution of meaning in the training set that's independent of interpretation and understanding. Aside from maybe the literal series of bits (and words and pixels) in it, as encoded.
In statistics that is not as severe a problem because you can plot how the data distribution lies in a commonly agreed upon position in one or more clearly defined and agreed upon dimensions. And you can look at the chart and talk about this shared interpretation objectively, and its distribution.
Although as a matter of fact just as often it matters what questions you asked, and how and when and whom you asked, for the distribution of answers you got. Lying with statistics is easy as it's full of hidden variables. This is why statistics is great when the data is simple and the analysis is simple, mathematical, objective, but social studies tend to yield, whatever you want them to yield.
So. What dimensions are we talking about with a self-evolved model? You have some understanding of what the data is, subjective to you. Maybe your team has some shared understanding of what the data covers, you have overlap. But the model has its own understanding, evolved independently. How much does it overlap with you? Not as much as you think.
It's a problem decades old, that people give to the model data that contains things they didn't realize it contains. They themselves didn't see that. And then get surprised by the results.
Say when an apple falls on your head, did you realize this contains the data required to describe classic mechanics? For centuries, billions of people didn't realize. To Newton it was there as clear as daylight. In the apple's fall. I know, the example is a myth, but the principle stands.
Another example, a video of the change of light patterns reflected on the floor around the corner of room where a person, out of frame, is writing on a computer. What does this data contain? You think nothing much. Maybe it contains how a floor looks. To a model, it can easily also contain what the person who is not in frame, wrote on their keyboard.
So given all this... what IS in the distribution? Depends with whose eyes you're looking. Your eyes are not the most objective eyes, nor the most intelligent eyes. You have no anchor to point to as the ultimate arbiter of what complex data contains or does not.