Just like during the pandemic how everyone became an epidemiologist, suddenly everyone's a copyright lawyer. I'll just dispute your assertion by saying:
1. Questions of fair use are famously gray, and anyone who declares something as "entirely fair use", with no caveats, is nearly always wrong except for the must obvious cases, which the given example is most definitely not. A judge has wide latitude in determining fair use.
2. People should familiarize themselves with the four factors of fair use determination. In particular, if a work is purely derivative of a source work and substantially negatively impacts the market for the original work, it's very likely to not be considered fair use.
A great overview is https://fairuse.stanford.edu/overview/fair-use/four-factors/
Roll back 20+ years ago on Slashdot and you'll see the exact same thing.
Copyright has been a hot button issue on the internet for decades. People end up thinking (rightly or wrongly) that they understand it without being a lawyer.
Quite literally, not even the lawyers or courts understand it. This is very much a "learn as you go" exercise for humanity in general at this point in time.
I only see this phenomenon speeding up. Strange times.
It's very possible that the example provided above is an example of fair use in some country, and that the website offering that service could be hosted there.
This is just a felony contempt of business model issue. Computers invalidated their business models and they're doing everything they possibly can to hang on for dear life. Society needs to move on already.
This is the weakest part of the case(s) against OpenAI. "Derivative work" is a legal term of art meaning a direct adaptation, like writing a screenplay of a book or translating a book into another language.
NYT has a stronger case than Sarah Silverman here because they can show actual 'memorized' text rather than just summarization, but given that those memorizations are a) an unintended failure mode of the training process, and b) from an older version of the model that has been updated to no longer regurgitate memorized text, it's not really clear how in current form GPT could possibly be considered a derivative work.
On the other hand, it's understandable why NYT is worried. OpenAI itself says that occupations like: Writers and Authors, Web and Digital Interface Designers, News Analysts, Reporters, and Journalists, Proofreaders and Copy Markers are "90-100% exposed" to what OpenAI is building.
The latter is more defensible.
CliffNotes, Wikipedia, etc. have huge quantities of summarized copyrighted work.
Second, you ignored the "purely derivative" bit. You have to look at to what extent the use is derivative or transformative. See https://en.wikipedia.org/wiki/Transformative_use for a bit about that. (Note, this is a legal term defined by various precedents. OpenAI can't just argue, "Turning it into an LLM is a transform, so it is transformative!") Since CliffNotes is educational and Wikipedia is nonprofit, it is relatively easy for both to qualify as transformative.
As a result your response underscores the point that was made. There are a lot of shades of grey. You really can't just seize on a couple of phrases and key points, then jump straight to the answer. You have to understand how the courts will decide, and then accept that there is an actual judgment call whose outcome depends on the judge judging.
(I'm not a lawyer, but I have had excessive exposure to them in the past.)
Is there data that supports this? I’d be interested to know what % of people who buy a Cliffs Notes have already _bought_ the original.
Yes.
For example, Wikipedia cites many research journals that otherwise are available only by subscription.
Prior to Wikipedia, gated information centers were the norm.
You're the one presenting unfounded claims with confidence here. There is well established case law about not being able to copyright facts. If you are actually fully paraphrasing a presentation of facts / ideas and not just altering a couple of words here and there, then there is a very strong case for non-infringement.
No, I'm not. On the contrary, I'm really looking forward to this case because I believe it will be a great test of a bunch of concepts that are totally novel in the world of copyright law as it applies to generative AI. The only things I am presenting with confidence are:
1. That anyone who declares that something is unambiguously fair use (or, contrarily, unambiguously infringing) is likely wrong. There is simply too much latitude by judges, and there have certainly been cases where a ruling went one way, only to be overturned on appeal.
2. While I certainly have an opinion on how I think this case will be decided, I'm not presenting that with unwarranted confidence. Instead, I linked that great article on the 4 factors of fair use determination because it's clear to me lots of people are saying "fair use!" on one side or the other with no understanding of the factors judges must actually consider when making a determination.
However, none of that matters in this particular thread. There are well established precedents about paraphrasing news articles and they do not support the claim you made
Remember. The NY Times does not have a record of filing frivolous lawsuits. Particularly not against companies with deep pockets. So it is almost certainly true that a lawyer who knows the law better than you thinks that this has a real chance. So you should be looking for flaws in trivial defenses that you can think up, rather than assuming that you know best.
For example take your copyright facts defense. That would be great if the NY Times was a phone book. They aren't, in addition to facts they offer analysis, editorial positions, and so on. For example I just asked ChatGPT, "In 2016, did the New York Times generally support or oppose President Trump?" I got back an answer talking about various kinds of concerns that the New York Times had, including an editorial titled, "Why Donald Trump Should Not Be President". The copy that ChatGPT needed to have to do that has a lot more than just facts in it.
Now if you paraphrased the NY Times like ChatGPT did when it answered me, you'd have a perfect fair use defense. But you aren't doing it for money, you didn't make a copy of all the NY Times, you aren't destroying the market for the NY Times, and you're legally able to own copyright in your transformed work. OpenAI is doing it for money, did copy all of the NY Times, is seriously impacting the market for NY Times articles, and ChatGPT generated text does not get a copyright.
Fair use is filled with shades of grey. Even if ChatGPT appears to do the same thing that you do, it is far less clear that OpenAI will enjoy the same level of fair use defense.
> They aren't, in addition to facts they offer analysis, editorial positions, and so on.
Those opinions and ideas are also not copyrightable. Only expressions of them are copyrightable, which is why paraphrasing facts, ideas and opinions is not a violation of copyright.
> Fair use is filled with shades of grey.
Yes, but not all those shade are equal. There is a long history of litigation showing that paraphrasing news articles is fine.