undefined | Better HN

0 pointshn_throwaway_992y ago0 comments

> What you described is entirely fair use, actually

Just like during the pandemic how everyone became an epidemiologist, suddenly everyone's a copyright lawyer. I'll just dispute your assertion by saying:

1. Questions of fair use are famously gray, and anyone who declares something as "entirely fair use", with no caveats, is nearly always wrong except for the must obvious cases, which the given example is most definitely not. A judge has wide latitude in determining fair use.

2. People should familiarize themselves with the four factors of fair use determination. In particular, if a work is purely derivative of a source work and substantially negatively impacts the market for the original work, it's very likely to not be considered fair use.

A great overview is https://fairuse.stanford.edu/overview/fair-use/four-factors/

0 comments

24 comments · 5 top-level

NegativeK2y ago· 6 in thread

> suddenly everyone's a copyright lawyer

Roll back 20+ years ago on Slashdot and you'll see the exact same thing.

Copyright has been a hot button issue on the internet for decades. People end up thinking (rightly or wrongly) that they understand it without being a lawyer.

whoknowsidont2y ago

> that they understand it without being a lawyer.

Quite literally, not even the lawyers or courts understand it. This is very much a "learn as you go" exercise for humanity in general at this point in time.

Mattbrown75312y ago

It seems like everything in tech is in the learn as you go phase. Everything is changing so rapidly that there can’t be experts. Just people that are able to adapt quickly.

I only see this phenomenon speeding up. Strange times.

Teever2y ago

One of my biggest gripes is a somewhat adjacent issue where everyone thinks they're an American copyright lawyer and that American copyright law is universal.

It's very possible that the example provided above is an example of fair use in some country, and that the website offering that service could be hosted there.

Vicinity96352y ago

Legality aside I think copyright of digital things in the digital age is a net negative to humanity.

matheusmoreira2y ago

Completely agree. Copyright should be abolished. All intellectual work is information, information is just bits and bits are just numbers. It's quite simply delusional to believe you can own numbers in the 21st century, the age of information and ubiquitous globally networked pocket supercomputers.

This is just a felony contempt of business model issue. Computers invalidated their business models and they're doing everything they possibly can to hang on for dear life. Society needs to move on already.

2 more replies

bomewish2y ago

This seems a bit disingenuous. Lawyers DISAGREE on this stuff (as we will see in this case) and a court will decide the reality by fiat.

caesil2y ago· 5 in thread

>if a work is purely derivative of a source work

This is the weakest part of the case(s) against OpenAI. "Derivative work" is a legal term of art meaning a direct adaptation, like writing a screenplay of a book or translating a book into another language.

NYT has a stronger case than Sarah Silverman here because they can show actual 'memorized' text rather than just summarization, but given that those memorizations are a) an unintended failure mode of the training process, and b) from an older version of the model that has been updated to no longer regurgitate memorized text, it's not really clear how in current form GPT could possibly be considered a derivative work.

dchichkov2y ago

"Transformative" seems to fit a lot more that "Derivative".

On the other hand, it's understandable why NYT is worried. OpenAI itself says that occupations like: Writers and Authors, Web and Digital Interface Designers, News Analysts, Reporters, and Journalists, Proofreaders and Copy Markers are "90-100% exposed" to what OpenAI is building.

paledot2y ago

We should all be worried about that. If journalism is replaced with AI, truth is replaced with the AI hallucination du jour.

2 more replies

chucke19922y ago

I don't buy into all these "dangers". The advent of cars did not decrease the amount of drivers and introduced various new jobs, that were not available for a lot of people. And the rise of computers, did not make the workforce smaller but instead opened many more opportunities for a lot of people.

1 more reply

avidiax2y ago

A question is whether the new model still intrinsically embeds the source text, but this is later filtered in the output, or if it no longer embeds the text at all.

The latter is more defensible.

skygazer2y ago

I would think an existing model could bootstrap a copyright free training corpus by completely rewriting/paraphrasing copyrighted material with semantic fidelity for training of the next model to completely eliminate memorization of copyrighted works. That might pose an interesting obstacle to copyright challenges, bootstrapping your way into a clean room. Although, tweaking the architecture to either eliminate memorization, or eliminate high fidelity reproduction of verbatim training data seems far more expedient and less costly.

paulddraper2y ago· 4 in thread

> if a work is purely derivative of a source work

CliffNotes, Wikipedia, etc. have huge quantities of summarized copyrighted work.

caesil2y ago

Summarization generally isn't considered a derivative work.

https://en.wikipedia.org/wiki/Derivative_work

btilly2y ago

First, you missed the "and". Do CliffNotes, Wikipedia, etc. substantially impact the market for the original work? For example CliffNotes does not - people who buy the CliffNotes version typically already have the original work as well (for example from coursework). And Wikipedia may well do more to interest people in the original work than to replace it.

Second, you ignored the "purely derivative" bit. You have to look at to what extent the use is derivative or transformative. See https://en.wikipedia.org/wiki/Transformative_use for a bit about that. (Note, this is a legal term defined by various precedents. OpenAI can't just argue, "Turning it into an LLM is a transform, so it is transformative!") Since CliffNotes is educational and Wikipedia is nonprofit, it is relatively easy for both to qualify as transformative.

As a result your response underscores the point that was made. There are a lot of shades of grey. You really can't just seize on a couple of phrases and key points, then jump straight to the answer. You have to understand how the courts will decide, and then accept that there is an actual judgment call whose outcome depends on the judge judging.

(I'm not a lawyer, but I have had excessive exposure to them in the past.)

joegahona2y ago

> people who buy the CliffNotes version typically already have the original work as well (for example from coursework)

Is there data that supports this? I’d be interested to know what % of people who buy a Cliffs Notes have already _bought_ the original.

1 more reply

paulddraper2y ago

> Do CliffNotes, Wikipedia, etc. substantially impact the market for the original work?

Yes.

For example, Wikipedia cites many research journals that otherwise are available only by subscription.

Prior to Wikipedia, gated information centers were the norm.

1 more reply

shkkmo2y ago· 4 in thread

> Questions of fair use are famously gray, and anyone who declares something as "entirely fair use", with no caveats, is nearly always wrong except for the must obvious cases, which the given example is most definitely not. A judge has wide latitude in determining fair use.

You're the one presenting unfounded claims with confidence here. There is well established case law about not being able to copyright facts. If you are actually fully paraphrasing a presentation of facts / ideas and not just altering a couple of words here and there, then there is a very strong case for non-infringement.

hn_throwaway_99OP2y ago

> You're the one presenting unfounded claims with confidence here.

No, I'm not. On the contrary, I'm really looking forward to this case because I believe it will be a great test of a bunch of concepts that are totally novel in the world of copyright law as it applies to generative AI. The only things I am presenting with confidence are:

1. That anyone who declares that something is unambiguously fair use (or, contrarily, unambiguously infringing) is likely wrong. There is simply too much latitude by judges, and there have certainly been cases where a ruling went one way, only to be overturned on appeal.

2. While I certainly have an opinion on how I think this case will be decided, I'm not presenting that with unwarranted confidence. Instead, I linked that great article on the 4 factors of fair use determination because it's clear to me lots of people are saying "fair use!" on one side or the other with no understanding of the factors judges must actually consider when making a determination.

shkkmo2y ago

You seem to be shifting the topic of this thread. The GP comment is about paraphrasing news articles while I don't see anything in the NYT lawsuit about paraphrasing. Rather, the NYT is concerned with exact reproduction or near exact reproduction. I too am very curious about the outcome of this case and wouldn't care bet either way on the outcome. I do have an opinion on what precedent would be better for our society but that doesn't mean I think that outcome is more likely.

However, none of that matters in this particular thread. There are well established precedents about paraphrasing news articles and they do not support the claim you made

btilly2y ago

The "unfounded claims" were backed up by a link to Stanford on fair use and copyright. That's the opposite of being unfounded.

Remember. The NY Times does not have a record of filing frivolous lawsuits. Particularly not against companies with deep pockets. So it is almost certainly true that a lawyer who knows the law better than you thinks that this has a real chance. So you should be looking for flaws in trivial defenses that you can think up, rather than assuming that you know best.

For example take your copyright facts defense. That would be great if the NY Times was a phone book. They aren't, in addition to facts they offer analysis, editorial positions, and so on. For example I just asked ChatGPT, "In 2016, did the New York Times generally support or oppose President Trump?" I got back an answer talking about various kinds of concerns that the New York Times had, including an editorial titled, "Why Donald Trump Should Not Be President". The copy that ChatGPT needed to have to do that has a lot more than just facts in it.

Now if you paraphrased the NY Times like ChatGPT did when it answered me, you'd have a perfect fair use defense. But you aren't doing it for money, you didn't make a copy of all the NY Times, you aren't destroying the market for the NY Times, and you're legally able to own copyright in your transformed work. OpenAI is doing it for money, did copy all of the NY Times, is seriously impacting the market for NY Times articles, and ChatGPT generated text does not get a copyright.

Fair use is filled with shades of grey. Even if ChatGPT appears to do the same thing that you do, it is far less clear that OpenAI will enjoy the same level of fair use defense.

shkkmo2y ago

The Stanford link is just generic information about the fair use tests and does nothing to backup the assertion.

> They aren't, in addition to facts they offer analysis, editorial positions, and so on.

Those opinions and ideas are also not copyrightable. Only expressions of them are copyrightable, which is why paraphrasing facts, ideas and opinions is not a violation of copyright.

> Fair use is filled with shades of grey.

Yes, but not all those shade are equal. There is a long history of litigation showing that paraphrasing news articles is fine.

bsenftner2y ago

I personally appreciate the semi truck sized loophole that is satire. One can include an entire copy written work within one's own work as long as the treatment of that other copy written work is parody / satire. This is a provision of US copyright law put in place to protect political satire, which can be anything, because politics is everything.

j / k navigate · click thread line to collapse

0 comments

24 comments · 5 top-level

NegativeK2y ago· 6 in thread

> suddenly everyone's a copyright lawyer

Roll back 20+ years ago on Slashdot and you'll see the exact same thing.

Copyright has been a hot button issue on the internet for decades. People end up thinking (rightly or wrongly) that they understand it without being a lawyer.

whoknowsidont2y ago

> that they understand it without being a lawyer.

Quite literally, not even the lawyers or courts understand it. This is very much a "learn as you go" exercise for humanity in general at this point in time.

Mattbrown75312y ago

It seems like everything in tech is in the learn as you go phase. Everything is changing so rapidly that there can’t be experts. Just people that are able to adapt quickly.

I only see this phenomenon speeding up. Strange times.

Teever2y ago

One of my biggest gripes is a somewhat adjacent issue where everyone thinks they're an American copyright lawyer and that American copyright law is universal.

It's very possible that the example provided above is an example of fair use in some country, and that the website offering that service could be hosted there.

Vicinity96352y ago

Legality aside I think copyright of digital things in the digital age is a net negative to humanity.

matheusmoreira2y ago

2 more replies

bomewish2y ago

This seems a bit disingenuous. Lawyers DISAGREE on this stuff (as we will see in this case) and a court will decide the reality by fiat.

caesil2y ago· 5 in thread

>if a work is purely derivative of a source work

dchichkov2y ago

"Transformative" seems to fit a lot more that "Derivative".

paledot2y ago

We should all be worried about that. If journalism is replaced with AI, truth is replaced with the AI hallucination du jour.

2 more replies

chucke19922y ago

1 more reply

avidiax2y ago

A question is whether the new model still intrinsically embeds the source text, but this is later filtered in the output, or if it no longer embeds the text at all.

The latter is more defensible.

skygazer2y ago

paulddraper2y ago· 4 in thread

> if a work is purely derivative of a source work

CliffNotes, Wikipedia, etc. have huge quantities of summarized copyrighted work.

caesil2y ago

Summarization generally isn't considered a derivative work.

https://en.wikipedia.org/wiki/Derivative_work

btilly2y ago

(I'm not a lawyer, but I have had excessive exposure to them in the past.)

joegahona2y ago

> people who buy the CliffNotes version typically already have the original work as well (for example from coursework)

Is there data that supports this? I’d be interested to know what % of people who buy a Cliffs Notes have already _bought_ the original.

1 more reply

paulddraper2y ago

> Do CliffNotes, Wikipedia, etc. substantially impact the market for the original work?

Yes.

For example, Wikipedia cites many research journals that otherwise are available only by subscription.

Prior to Wikipedia, gated information centers were the norm.

1 more reply

shkkmo2y ago· 4 in thread

hn_throwaway_99OP2y ago

> You're the one presenting unfounded claims with confidence here.

shkkmo2y ago

However, none of that matters in this particular thread. There are well established precedents about paraphrasing news articles and they do not support the claim you made

btilly2y ago

The "unfounded claims" were backed up by a link to Stanford on fair use and copyright. That's the opposite of being unfounded.

Fair use is filled with shades of grey. Even if ChatGPT appears to do the same thing that you do, it is far less clear that OpenAI will enjoy the same level of fair use defense.

shkkmo2y ago

The Stanford link is just generic information about the fair use tests and does nothing to backup the assertion.

> They aren't, in addition to facts they offer analysis, editorial positions, and so on.

Those opinions and ideas are also not copyrightable. Only expressions of them are copyrightable, which is why paraphrasing facts, ideas and opinions is not a violation of copyright.

> Fair use is filled with shades of grey.

Yes, but not all those shade are equal. There is a long history of litigation showing that paraphrasing news articles is fine.

bsenftner2y ago

j / k navigate · click thread line to collapse