There's enough examples of it regurgitating longish verbatim code out there, and not just comments or GPL license text.
If they are comfortable training it on code that isn't licensed for unrestricted copy/paste, I don't personally understand why they can't train it on their own code that's also not licensed for that.
Edit: They even added 'q rsqrt,' to their banned word list to squelch an example of long verbatim code passages.
Basically, it's not that I don't understand your explanation. It's that it does emit long passages of unchanged code in practice, for whatever real-world reason.