undefined | Better HN

0 pointslesuorac2y ago0 comments

They're probably saying that because its what the supreme court said except about a human copying a work created by another human.

https://www.npr.org/2023/05/18/1176881182/supreme-court-side...

0 comments

5 comments · 1 top-level

Kon-Peki2y ago· 4 in thread

That's a good bet.

Down at the bottom of the linked PDF are some more interesting allegations:

Count 5 - MS/OpenAI removed NYT copyright notices in violation of the DMCA.

Count 7 - By attributing hallucinated garbage to NYT, MS/OpenAI is diluting NYT trademarks in violation of US Trademark law.

I admit: I laughed. This will be an entertaining lawsuit to follow.

lesuoracOP2y ago

Very interested how this turns out as IIUC copyright violations have statutory damages which the NYT won't have to prove.

$750 [1] * 66 million records [the lawsuit] is basically 50 billion.

[1]: https://www.ce9.uscourts.gov/jury-instructions/node/706

bugglebeetle2y ago

What will ultimately happen is that OpenAI and all big tech with have to pay out some sizable sum to large copyright holders, and in exchange be granted a de facto exclusive right to develop these technologies further because they’re the only ones who can do so “responsibly” with respect to copyright. It will take a long time to wind its way through the courts, but this could be the death knell for open source LLMs in the US.

trevelyan2y ago

The prompts shown literally invite the LLM to complete the copyrighted text by providing unedited selections and asking the machine to finish those. Even if this is problematic in a small number of cases it is not a use case that undermines the business model of the newspaper since it requires the reader to have access to the original text. Nor will it be easy to demonstrate economic harm since this is not how readers consume news and is very far from how users interact with LLMs. Nor are the archival materials used for training remotely reflective of the "time-sensitive" articles that newspapers sell. And archival materials are easily available elsewhere so where is the case for economic harm?

The courts are going to rule that LLM training is a transformative use case that is protected as fair use under copyright law. They may rule that if an LLM-powered service is explicitly designed to enable copyright violation that is illegal, but there is no way any court is going to look at these examples and see it as anything other than the NYT fishing to try and generate a violation by using the LLM in a way that is very different than the service is intended to be used and which -- even if abused -- doesn't hurt the business model under which the text has been produced.

The most likely outcome is that LLM providers will add some sort of filter on output to prevent machines from regurgitating source documents. But this isn't a court case the NYT can win without gutting fair use protections, and that would be a terrible thing.

danielbln2y ago

Meanwhile, open source LLMs are excluded from stringendo regulation in the US, abd with Mistral there is some knowhow that isn't in SV, which is also jicem

j / k navigate · click thread line to collapse