undefined | Better HN

0 pointslogicchains2y ago0 comments

> Something like, summarise all articles on US-UK relationships over past 5 years. I charge money for it, and all I pay NYT is a monthly subscription fee.

>Is that fair use? IANAL, but doesn't sound like it.

If you pay someone to do the summarisation for you, then you publish the content and charge a fee for it, you're the one liable, not the person you paid to summarise it for you. Similarly if you ask GPT to do it for you, then publish it, you're liable for what you publish; GPT is just a summarisation tool.

0 comments

8 comments · 2 top-level

tsimionescu2y ago· 3 in thread

That's not true at all. If you pay someone to copy NYT articles for you verbatim, and then they give the copies to you, and then you publish them online, then you've both violated the copyright. You are never allowed to make copies of copyrighted works, even for private deals (making such copies for purely personal use, such as archival, falls under fair use - but you can't build a service out of that).

So, if the summaries are derived works and not covered by fair use, then both you and the summarizee are separately breaking the NYT's copyrights. Otherwise, if this is covered by fair use, then you are both in the clear.

Finally, GPT is not "a summarization tool" in this case. If you provide a copy of a NYT article as a prompt and then ask for summarization, then yes, it is clear that GPT is not doing anything wrong, even if it spits out the exact same text. But if you simply ask for a summary of a specific article by, say, just name and date, and you get a copy of it, it's clear that GPT is storing the original data in some way, and thus it has copied the NYT's protected works without permission.

logicchainsOP2y ago

>But if you simply ask for a summary of a specific article by, say, just name and date, and you get a copy of it, it's clear that GPT is storing the original data in some way, and thus it has copied the NYT's protected works without permission.

In this particular case they were using it via Bing, which actively did a HTTP request to the particular article to extract the content. So GPT hadn't memorised it verbatim, instead it fetched it, much like a human using a search engine would.

tsimionescu2y ago

The article states that they used it initially through ChatGPT, but that seems to have been fixed in the meantime, at least for the very simplistic queries that used to work ("the first paragraph of the Carl Zimmer article on old DNA" in ChatGPT used to return the exact data from NYT, and "next paragraph" could then be used to get the following ones). Even if this has been fixed, it still proves that ChatGPT encodes exact copies of NYT articles in its weights, which may be a violation in itself, even if it is prevented from returning them directly. Especially if they ever started distributing the trained model.

Additionally, even the use through Copilot is very debatable. They are not returning the NYT link, which requires a subscription, they are returning the contents of it even to non-subscribers. And they are doing this in a commercial product, not a non profit like the Internet Archive, which has some arguments for fair use.

1 more reply

BlueTemplar2y ago

Also, ChatGPT isn't a person with rights and duties. The people that made it are responsible for it.

rich_sasha2y ago· 3 in thread

That's not the example. Here I proactively scrape NYT, summarise articles for a fee and sell that as a service. It's not people coming to me with some articles to summarise, and maybe then publishing it online.

At some level it becomes a subversion of NYTs fees. First, say I subscribe and simply host the articles verbatim, for a fee. Clearly, that's not right.

Suppose I change some spelling or word order, or use a synonym or two. That's still not ok.

And if I substantially paraphrase the articles? I guess this is the relevant case. This is kind of what LLMs do. And also feels like not fair use.

logicchainsOP2y ago

>That's not the example. Here I proactively scrape NYT, summarise articles for a fee and sell that as a service. It's not people coming to me with some articles to summarise, and maybe then publishing it online.

That's not what OpenAI is doing; it's not selling summarised articles as a service. Your example is a false equivalence.

>This is kind of what LLMs do. And also feels like not fair use

An LLM doesn't do this unless you ask it to. And if you then take that output and publish it as your own, you're breaching the copyright, not OpenAI.

heavyset_go2y ago

> An LLM doesn't do this unless you ask it to. And if you then take that output and publish it as your own, you're breaching the copyright, not OpenAI.

In this case, OpenAI is violating copyright by modifying, reproducing and distributing copyrighted content to its customer.

8note2y ago

How far is this from what reddit does?

I read a NYT article, then summarize it into a link title for reddit. Reddit then republishes the summary to all of its users.

j / k navigate · click thread line to collapse

0 comments

8 comments · 2 top-level

tsimionescu2y ago· 3 in thread

logicchainsOP2y ago

tsimionescu2y ago

1 more reply

BlueTemplar2y ago

Also, ChatGPT isn't a person with rights and duties. The people that made it are responsible for it.

rich_sasha2y ago· 3 in thread

At some level it becomes a subversion of NYTs fees. First, say I subscribe and simply host the articles verbatim, for a fee. Clearly, that's not right.

Suppose I change some spelling or word order, or use a synonym or two. That's still not ok.

And if I substantially paraphrase the articles? I guess this is the relevant case. This is kind of what LLMs do. And also feels like not fair use.

logicchainsOP2y ago

That's not what OpenAI is doing; it's not selling summarised articles as a service. Your example is a false equivalence.

>This is kind of what LLMs do. And also feels like not fair use

An LLM doesn't do this unless you ask it to. And if you then take that output and publish it as your own, you're breaching the copyright, not OpenAI.

heavyset_go2y ago

> An LLM doesn't do this unless you ask it to. And if you then take that output and publish it as your own, you're breaching the copyright, not OpenAI.

In this case, OpenAI is violating copyright by modifying, reproducing and distributing copyrighted content to its customer.

8note2y ago

How far is this from what reddit does?

I read a NYT article, then summarize it into a link title for reddit. Reddit then republishes the summary to all of its users.

j / k navigate · click thread line to collapse