A federal judge sides with Anthropic in lawsuit over training AI on books (opens in new tab)

(techcrunch.com)

183 pointsmoose441y ago212 comments

212 comments

102 comments · 15 top-level

3PS1y ago· 33 in thread

Broadly summarizing.

This is OK and fair use: Training LLMs on copyrighted work, since it's transformative.

This is not OK and not fair use: pirating data, or creating a big repository of pirated data that isn't necessarily for AI training.

Overall seems like a pretty reasonable ruling?

derbOac1y ago

But those training the LLMs are still using the works, and not just to discuss them, which I think is the point of fair use doctrine. I guess I fail to see how it's any different from me using it in some other way? If I wanted to write a play very loosely inspired by Blood Meridian, it might be transformative, but that doesn't justify me pirating the book.

I tend to think copyright should be extremely limited compared to what it is now, but to me the logic of this ruling is illogical other than "it's ok for a corporation to use lots of works without permission but not for an individual to use a single work without permission." Maybe if they suddenly loosened copyright enforcement for everyone I might feel differently.

"Kill one man, and you are a murderer. Kill millions of men, and you are a conqueror." (An admittedly hyperbolic comparison, but similar idea.)

rcxdude1y ago

>If I wanted to write a play very loosely inspired by Blood Meridian, it might be transformative, but that doesn't justify me pirating the book.

I think that's the conclusion of the judge. If Anthropic were to buy the books and train on them, without extra permission from the authors, it would be fair use, much like if you were to be inspired by it (though in that case, it may not even count as a derivative work at all, if the relationship is sufficiently loose). But that doesn't mean they are free to pirate it either, so they are likely to be liable for that (exactly how that interpretation works with copyright law I'm not entirely sure: I know in some places that downloading stuff is less of a problem than distributing it to others because the latter is the main thing that copyright is concerned with. And AFAIK most companies doing large model training are maintaining that fair use also extends to them gathering the data in the first place).

(Fair use isn't just for discussion. It covers a broad range of potential use cases, and they're not enumerated precisely in copyright law AFAIK, there's a complicated range of case law that forms the guidelines for it)

2 more replies

dragonwriter1y ago

> I tend to think copyright should be extremely limited compared to what it is now, but to me the logic of this ruling is illogical other than "it's ok for a corporation to use lots of works without permission but not for an individual to use a single work without permission."

That's not what the ruling says.

It says that training a generative AI system not designed primarily as a direct replacement for a work on one or more works is fair use, and that print-to-digital destructive scanning for storage and searchability is fair use.

These are both independent of whether one person or a giant company or something in between is doing it, and independent of the number of works involved (there's maybe a weak practical relationship to the number of works involved, since a gen AI tool that is trained on exactly one work is probably somewhat less likely to have a real use beyond a replacement for that work.)

fallingknife1y ago

But if you did pirate the book, and let's say it cost $50, and then you used it to write a play based on that book and made $1 million selling that, only the $50 loss to the publisher would be relevant to the lawsuit. The fact that you wrote a non-infringing play based on it and made $1 million would be irrelevant to the case. The publisher would have no claim to it.

comex1y ago

The judge actually agreed with your first paragraph:

> This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no decision holding or requiring that pirating a book that could have been bought at a bookstore was reasonably necessary to writing a book review, conducting research on facts in the book, or creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.

(But the judge continued that "this order need not decide this case on that rule": instead he made a more targeted ruling that Anthropic's specific conduct with respect to pirated copies wasn't fair use.)

tantalor1y ago

The analogy to training is not writing a play based on the work. It's more like reading (experiencing) the work and forming memories in your brain, which you can access later.

I'm allowed to hear a copyrighted tune, and even whistle it later for my own enjoyment, but I can't perform it for others without license.