undefined | Better HN

0 pointsChabsff2y ago0 comments

That's a common mechanism to achieve generalization, but the term is a little more general (heh) than that. It specifically refers to correctly handling data that lives outside the distribution presented by the training data.

It's a description of a behavior, not a mechanism. Which may or may not be appropriate depending on whether you are talking about *what* the model does or *how* it achieves it.

0 comments

3 comments · 1 top-level

3cats-in-a-coat2y ago· 2 in thread

Kinda fuzzy what's "in the distribution", because it depends on how deeply the model interprets it. If it understands examples outside the distribution... that kinda puts them in the distribution.

General understanding makes the information in the distribution very wide. Shallow understanding makes it very narrow. Like say recognizing only specific combinations of pixels verbatim.

ChabsffOP2y ago

I think you are misinterpreting. The distribution present in the training set in isolation (the one I'm referring to, and is not fuzzy in the slightest) is not the same thing as the distribution understood by the trained model (the one you are referring to, and is definitely more conceptual and hard to characterize in non-trivial cases).

"Generalization" is simply the theoretical measure of how much the later extends beyond the former, regardless of how that's achieved.

3cats-in-a-coat2y ago

I'm saying how you determine the distribution in the training set depends on what the model understands and what the people who selected the dataset understand.

There's no distribution of meaning in the training set that's independent of interpretation and understanding. Aside from maybe the literal series of bits (and words and pixels) in it, as encoded.

In statistics that is not as severe a problem because you can plot how the data distribution lies in a commonly agreed upon position in one or more clearly defined and agreed upon dimensions. And you can look at the chart and talk about this shared interpretation objectively, and its distribution.

Although as a matter of fact just as often it matters what questions you asked, and how and when and whom you asked, for the distribution of answers you got. Lying with statistics is easy as it's full of hidden variables. This is why statistics is great when the data is simple and the analysis is simple, mathematical, objective, but social studies tend to yield, whatever you want them to yield.

So. What dimensions are we talking about with a self-evolved model? You have some understanding of what the data is, subjective to you. Maybe your team has some shared understanding of what the data covers, you have overlap. But the model has its own understanding, evolved independently. How much does it overlap with you? Not as much as you think.

It's a problem decades old, that people give to the model data that contains things they didn't realize it contains. They themselves didn't see that. And then get surprised by the results.

Say when an apple falls on your head, did you realize this contains the data required to describe classic mechanics? For centuries, billions of people didn't realize. To Newton it was there as clear as daylight. In the apple's fall. I know, the example is a myth, but the principle stands.

Another example, a video of the change of light patterns reflected on the floor around the corner of room where a person, out of frame, is writing on a computer. What does this data contain? You think nothing much. Maybe it contains how a floor looks. To a model, it can easily also contain what the person who is not in frame, wrote on their keyboard.

So given all this... what IS in the distribution? Depends with whose eyes you're looking. Your eyes are not the most objective eyes, nor the most intelligent eyes. You have no anchor to point to as the ultimate arbiter of what complex data contains or does not.

j / k navigate · click thread line to collapse