In most cases it won't regurgitate the same training data. What happens is the model essentially models a full on continuous curve that best fits in-between the training data. The amount of points on that curve is 9999999x more then the training data and that 999999 is not an exaggeration. It's likely too small of a number.
I disagree, the size of the models are a lot smaller than the training data.
Just because I make an algorithm that linearly interpolates between two (copyrighted) values doesn't mean that it is creative or holds the wisdom between them.