Actually the legal system is evidence based. Microsoft has evidence that the code they are producing is licensed under MIT as far as they can reasonably know. There's no definitive way to know that who actually owns the original copyright. I could grant permission to use my repo, but maybe I got that code from someone else, who then got it from someone else and so on and so forth. It's a similar situation with stolen goods, if you unknowingly purchase stolen goods you usually cannot be charged for theft as long as there aren't obvious signs that it's stolen such as the goods being priced far below market value.
Microsoft has evidence that the code they are reproducing is MIT licensed, so are they intentionally violating that license or does this AI thing include the license and attribution in every snippet it generates?
Major aspects of copyright infringement are strict liability, like a lot of civil actions around damages. It doesn't matter if you thought it was OK, there's still a damaged party that needs compensation according to the law. At best you'll simply avoid the criminal and punitive penalties.
No, PornHub doesn't have liability in a lot of cases because of 17 § 512, but has still had to deal with liability in general, which is why they nuked some 80% of their library not backed by verified individuals a while back.
A huge part of 17§512 is the DMCA takedown process mainly in 17§512(c)(3). Does Microsoft even have the ability to truly remove training data from the model? Or do they have to retrain on each DMCA takedown?
If they had a reasonable basis for believing they had a license they're in the clear. "I didn't know" might not be enough but "I had good reasons to think otherwise" is.
I personally don't want to have to upload proof of identity to GitHub and a signed document swearing that I own the copyright to all the code I upload to GitHub, or proof that I coded it. We need to be careful what we wish for.
> THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.