Model training works roughly by feeding the model a text excerpt and then hiding the last word in the excerpt. The model is then asked to "guess" what the final word is. It will then move around it's weights until the guess sufficiently matches the actual token. Then the process repeats.
The training material is used to play this guessing game to dial in it's weights. The training data is picked up, used as reference material for the game, and then discarded. It's hard to place this far from what humans do when reading, because both are using the information to mold their respective "brains" and both are doing an acquire, analyze, discard process.
At no point is training data actually copied into the model itself, it's just run past the "eyes" of the model to play the training game.