Training a model on more data improves generalization not memorization.
To store more information in the same number of parameters requires the commonality between examples to be encoded.
In contrast, the less data trained on, especially if repeated, lets the network learn to provide good answers for that limited set without generalizing. I.e. memorizing.
——
It’s the same as with people. The more variations people see of something, the more likely they intuit the underlying pattern.
The fewer examples, the more likely they just pattern match.