https://www.npr.org/2023/05/18/1176881182/supreme-court-side...
Down at the bottom of the linked PDF are some more interesting allegations:
Count 5 - MS/OpenAI removed NYT copyright notices in violation of the DMCA.
Count 7 - By attributing hallucinated garbage to NYT, MS/OpenAI is diluting NYT trademarks in violation of US Trademark law.
I admit: I laughed. This will be an entertaining lawsuit to follow.
$750 [1] * 66 million records [the lawsuit] is basically 50 billion.
[1]: https://www.ce9.uscourts.gov/jury-instructions/node/706
The courts are going to rule that LLM training is a transformative use case that is protected as fair use under copyright law. They may rule that if an LLM-powered service is explicitly designed to enable copyright violation that is illegal, but there is no way any court is going to look at these examples and see it as anything other than the NYT fishing to try and generate a violation by using the LLM in a way that is very different than the service is intended to be used and which -- even if abused -- doesn't hurt the business model under which the text has been produced.
The most likely outcome is that LLM providers will add some sort of filter on output to prevent machines from regurgitating source documents. But this isn't a court case the NYT can win without gutting fair use protections, and that would be a terrible thing.