undefined | Better HN

0 pointstpmoney2mo ago0 comments

> I think there's no meaningful case by the letter of the law that use of training data that include GPL-licensed software in models that comprise the core component of modern LLMs doesn't obligate every producer of such models to make both the models and the software stack supporting them available under the same terms.

Why do you think "fair use" doesn't apply in this case? The prior Bartz vs Anthropic ruling laid out pretty clearly how training an AI model falls within the realm of fair use. Authors Guild vs Google and Authors Guild vs HathiTrust were both decided much earlier and both found that digitizing copyrighted works for the sake of making them searchable is sufficiently transformative to meet the standards of fair use. So what is it about GPL licensed software that you feel would make AI training on it not subject to the same copyright and fair use considerations that apply to books?

0 comments

13 comments · 4 top-level

ronsor2mo ago· 3 in thread

> So what is it about GPL licensed software that you feel would make AI training on it not subject to the same copyright and fair use considerations that apply to books?

The poster doesn't like it, so it's different. Most of the "legal analysis" and "foregone conclusions" in these types of discussions are vibes dressed up as objective declarations.

input_sh2mo ago

You seem like the type of person that will believe anything as long as someone cites a case without looking into it. Bartz v Anthropic only looked at books, and there was still a 1.5 billion settlement that Anthropic paid out because it got those books from LibGen / Anna's Archive, and the ruling also said that the data has to be acquired "legitimately".

Whether data acquired from a licence that specifically forbids building a derivative work without also releasing that derivative under the same licence counts as a legitimate data gathering operation is anyone's guess, as those specific circumstances are about as far from that prior case as they can be.

eru2mo ago

As long as they don't distribute the model's weights, even a strict interpretation of the GPL should be fine. Same reason Google doesn't have to upstream changes to the Linux kernel they only deploy in-house.

2 more replies

ronsor2mo ago

Have you actually read the text of the GPL?

> This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.

It is legitimate to acquire GPL software. The requirements of the license only occur if you're distributing the work AND fair use does not apply.

Training certainly doesn't count as distribution, so the buck passes to inference, which leaves us dealing with substantial similarity test, and still, fair use.

2 more replies

shakna2mo ago· 2 in thread

Bartz v Anthropic explicitly held ruling on fair use. It is not precedent, here.

derektank2mo ago

I’m not a lawyer, but I read the decision, and how is this section not a ruling on fair use?

“To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library. Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.”

Or in the final judgement, “This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason.”

shakna2mo ago

There's two parts here.

The first:

> it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library

It is only fair use where Anthropic had already purchased a license to the work. Which has zero to do with scraping - a purchase was made, an exchange of value, and that comes with rights.

The second, which involves a section of the judgement a little before your quote:

> And, as for any copies made from central library copies but not used for training, this order does not grant summary judgment for Anthropic.

This is where the court refused to make any ruling. There was no exchange of value here, such as would happen with scraping. The court made no ruling.

1 more reply

jerf2mo ago· 2 in thread

You sound like you're citing the general Internet understanding of "fair use", which seems to amount to "I can do whatever I like to any copyrighted content as long as maybe I mutilate it enough and shout 'FAIR USE!' loudly enough."

On the real measures of "fair use", at least in the US: https://fairuse.stanford.edu/overview/fair-use/four-factors/ I would contend that it absolutely face plants on all four measures. The purpose is absolutely in the form of a "replacement" for the original, the nature is something that has been abundantly proved many times over in court as being something copyrightable as a creative expression (with limited exceptions for particular bits of code that are informational), the "amount and substantiality" of the portions used is "all of it", and the effect of use is devastating to the market value of the original.

You may disagree. A long comment thread may ensue. However, all I really need for my point here is simply that it is far, far from obvious that waving the term "FAIR USE!" around is a sufficient defense. It would be a lengthy court case, not a slam-dunk "well duh it's obvious this is fair use". The real "fair use" and not the internet's "FAIR USE!" bear little resemblance to each other.

A sibling comment mentions Bartz v. Anthropic. Looking more at the details of the case I don't think it's obvious how to apply it, other than as a proof that just because an AI company acquired some material in "some manner" doesn't mean they can just do whatever with it. The case ruled they still had to buy a copy. I can easily make a case that "buying a copy" in the case of a GPL-2 codebase is "agreeing to the license" and that such an agreement could easily say "anything trained on this must also be released as GPL-2". It's a somewhat lengthy road to travel, where each step could result in a failure, but the same can be said for the road to "just because I can lay my hands on it means I can feed it to my AI and 100% own the result" and that has already had a step fail.

jrm42mo ago

"Real" fair use is perhaps one of the most nebulous legal concepts possible. I haven't dived deep into software, but a cursory look at how it "works (I use that term as loosely as possible)" in music with sampling and interpolation etc immediately reveals that there's just about nothing one can rely on in any logical sense.

tpmoneyOP2mo ago

I'm not really sure why you think my comment specifically citing the recent rulings by Judge Alsup and also the prior history with respect to the Google Books project is somehow declaring "I can do whatever I like to any copyrighted content", but I assure you I'm not. I'm very specifically talking about the various cases that have come about in the digital age dealing with fair use as it has been interpreted by US courts to apply to the use of computers to create copies of works for the purposes of creating other works.

I'm referring to the long history of carefully threaded fair use rulings and settlements, many of which we as an industry have benefitted greatly from. From determinations that cloning a BIOS can be fair use (see IBM PC bios cloning, but also Sony v. Connectix), or that cloning an entire API for the purposes of creating a parallel competitive product (Google v. Oracle), or digitizing books for the purposes of making those books searchable and even displaying portions of those books to users (Authors Guild v. Google) or even your cable company offering you "remote DVR" copying of broadcast TV (20th Century Fox v. Cablevision). Time and again the courts have found that copyright, and especially copyright with respect to digital transformations is far more limited than large corporations would prefer. Further they have found in plenty of cases that even a direct 1:1 copy of source can be fair use, let alone copies which are "transformative" as LLM training was found to be in Bartz.

Realistically, I don't see how anyone can have watched the various copyright cases that have been decided in the digital age, and seen the battles that the EFF (and a good part of the tech industry) have waged to reduce the strength of copyright and not also see how AI training can very easily fit within that same framework.

Not to cast aspersions on my fellow geeks and nerds, but it has been very interesting to me to watch the "hacker" world move from "information wants to be free" to "copyright maximalists" once it was their works that were being copied in ways they didn't like. For an industry that has brought about (and heavily promoted and supported) things like DeCSS, BitTorrent, Handbrake, Jellyfin/Plex, numerous emulators, WINE, BIOS and hardware cloning, ad blockers, web scrapers and many other things that copyright owners have been very unhappy about, it's very strange to see this newfound respect for the sanctity of copyright.

> I can easily make a case that "buying a copy" in the case of a GPL-2 codebase is "agreeing to the license" and that such an agreement could easily say "anything trained on this must also be released as GPL-2".

And I would argue that obtaining a legal copy of the GPL source to a program requires no such agreement. By downloading a copy of a GPLed program I am entitled by the terms under which that software was distributed to obtain a copy of the source code. I do not have to agree to any other terms in order to obtain that source code, downloading from someone authorized to distribute that code is in and of itself sufficient to entitle me to that source code. You can not, by the very terms of the GPL itself deny me a copy of the source code for GPL software you have distributed to me, even if you believe I intend to make distributions that are not GPL compliant. You can decline to distribute the software to me in the first place, but once you have distributed it to me, I am legally entitled to a copy of the source code. From there, now that I have a legal copy, the question becomes is making additional copies for the purposes of training an AI model fair use? So far, the most definitive case we have on the matter (Bartz) says yes it is.

So either we have to make the case that the original copy was somehow acquired from a source not authorized to make that copy, or we have to argue that the output of the AI model or the AI model is itself infringing. Given the ruling that copies made for training an AI model was ruled "exceedingly transformative and was a fair use under Section 107 of the Copyright Act"[1] it seems unlikely that the AI model itself is going to be found to be infringing. That leaves the output of the model itself, which Bartz does not rule on, as the authors never alleged the output of the model was infringing. GPL software authors might be able to prevail on that point, but they would have a pretty uphill battle I think in demonstrating that the model generated infringing output and not simply functional necessary code that isn't covered by copyright. The ability of code to be subject to copyright has long been a sort of careful balance between protecting a larger creative idea, and also not simply walling off whole avenues of purely functional decisions from all competitors.

[1]: https://admin.bakerlaw.com/wp-content/uploads/2025/07/ECF-23...

advael2mo ago· 2 in thread

Broadly speaking, GPL is a license that has specific provisions about creating derivative software from the licensed work, and just saying "fair use" doesn't exempt you from those provisions. More specifically, an advertised use case (in fact, arguably the main one at this stage) of the most popular closed models as they're currently being used is to produce code, some of which is going to be GPL licensed. As such, the code used is part of the functionality of the program. The fact that this program was produced from the source code used by a machine learning algorithm rather than some other method doesn't change this fundamental fact.

The current supreme court may think that machine learning is some sort of magic exception, but they also seem to believe whatever oligarchs will bribe them to believe. Again, I doubt the law will be enforced as written, but that has more to do with corruption than any meaningful legal theory. Arguments against this claim seem to ignore that courts have already ruled these systems to not have intellectual property rights of their own, and the argument for fair use seems to rely pretty heavily on some handwavey anthropomorphization of the models.

mr_toad2mo ago

> Broadly speaking, GPL is a license that has specific provisions about creating derivative software from the licensed work, and just saying "fair use" doesn't exempt you from those provisions.

Broadly speaking, yes it does. The whole point of fair use is that you don’t need a license.

davemp2mo ago

Claiming LLMs are fair use is ridiculous bordering on ignorant or disingenuous.

Here’s the 4 part test from 17 U.S.C. § 107:

1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

Fail. The use is to make trillions of dollars and be maximally disruptive.

2. the nature of the copyrighted work;

Fail. In many cases at least, the copy written code is commercial or otherwise supports livelihoods; and is the result much high skill labor with the express stipulation for reciprocity.

3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

Fail. They use all of it.

4. the effect of the use upon the potential market for or value of the copyrighted work.

Fail to the extreme. There is already measurable decline in these markets. The leaders explicitly state that they want to put knowledge workers out of business.

- - -

Hell, LLMs don’t even pass the sniff test.

The only reason this stuff is being entertained is some combination of the prisoner’s dilemma and more classic greed.

4 more replies

j / k navigate · click thread line to collapse

0 comments

13 comments · 4 top-level

ronsor2mo ago· 3 in thread

> So what is it about GPL licensed software that you feel would make AI training on it not subject to the same copyright and fair use considerations that apply to books?

The poster doesn't like it, so it's different. Most of the "legal analysis" and "foregone conclusions" in these types of discussions are vibes dressed up as objective declarations.

input_sh2mo ago

eru2mo ago

2 more replies

ronsor2mo ago

Have you actually read the text of the GPL?

> This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.

It is legitimate to acquire GPL software. The requirements of the license only occur if you're distributing the work AND fair use does not apply.

Training certainly doesn't count as distribution, so the buck passes to inference, which leaves us dealing with substantial similarity test, and still, fair use.

2 more replies

shakna2mo ago· 2 in thread

Bartz v Anthropic explicitly held ruling on fair use. It is not precedent, here.

derektank2mo ago

I’m not a lawyer, but I read the decision, and how is this section not a ruling on fair use?

shakna2mo ago

There's two parts here.

The first:

> it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library

It is only fair use where Anthropic had already purchased a license to the work. Which has zero to do with scraping - a purchase was made, an exchange of value, and that comes with rights.

The second, which involves a section of the judgement a little before your quote:

> And, as for any copies made from central library copies but not used for training, this order does not grant summary judgment for Anthropic.

This is where the court refused to make any ruling. There was no exchange of value here, such as would happen with scraping. The court made no ruling.

1 more reply

jerf2mo ago· 2 in thread

jrm42mo ago

tpmoneyOP2mo ago

[1]: https://admin.bakerlaw.com/wp-content/uploads/2025/07/ECF-23...

advael2mo ago· 2 in thread

mr_toad2mo ago

> Broadly speaking, GPL is a license that has specific provisions about creating derivative software from the licensed work, and just saying "fair use" doesn't exempt you from those provisions.

Broadly speaking, yes it does. The whole point of fair use is that you don’t need a license.

davemp2mo ago

Claiming LLMs are fair use is ridiculous bordering on ignorant or disingenuous.

Here’s the 4 part test from 17 U.S.C. § 107:

1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

Fail. The use is to make trillions of dollars and be maximally disruptive.

2. the nature of the copyrighted work;

Fail. In many cases at least, the copy written code is commercial or otherwise supports livelihoods; and is the result much high skill labor with the express stipulation for reciprocity.

3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

Fail. They use all of it.

4. the effect of the use upon the potential market for or value of the copyrighted work.

Fail to the extreme. There is already measurable decline in these markets. The leaders explicitly state that they want to put knowledge workers out of business.

- - -

Hell, LLMs don’t even pass the sniff test.

The only reason this stuff is being entertained is some combination of the prisoner’s dilemma and more classic greed.

4 more replies

j / k navigate · click thread line to collapse