Maybe I'm using it wrong but I've hardly seen it pump out a mass volume of code.
Copyright violations are a genuine concern from the outputted code, GitHub themselves have admitted it may emit raw training data rarely.