Microsoft want court to toss lawsuit accusing them of abusing open-source code (opens in new tab)

(reuters.com)

126 pointsThaDood3y ago63 comments

63 comments

36 comments · 9 top-level

User233y ago· 8 in thread

Breaking news: litigant wants to win lawsuit.

They probably didn't rigorously track the licensing issue, but I'm pretty sure training a LLM is completely acceptable use of source under Freely licensed code. It would be somewhat amusing though if CoPilot is forced to spit out the license for every piece of code used to develop the derivative work, along with copyright notices and whatever else the licenses may require.

candiddevmike3y ago

That's the point though, if you recreate the code you need to follow it's license, which typically involves some kind of attribution. Copilot should be forced to spit out a list of all licenses it referenced. That would actually be pretty valuable.

kosievdmerwe3y ago

Furthermore, the language model itself is clearly a for profit derivative work and so would be subject to the wants of the original copyright owners and it is clearly a derivative work since without the inputs of the copyrighted code in its training it would be different and likely less effective.

There's a more interesting question about the copyright status of the code it outputs, since the language model is sort of like a compiler, but also not like a compiler since the output is based on other people's copyrighted code.

I feel a lot of people get caught up on the output code and completely ignore the fact that copilot itself is likely a massive copyright violation.

2 more replies

randombits03y ago

Hold on, there is a difference between “recreate” and “copy”. Copyright only applies to creative expressions. If the code is trivially “recreated”, it’s not particularly creative.

Copyrighted content can be used without the holder’s permission under “Fair Use”.

Don’t assume all code can be copyrighted. Purely functional expressions are not copyrightable. Code is math.

There’s a lot here to unpack.

2 more replies

wonks3y ago

Wait, are you saying that there is legal precedent for training an LLM with open source code to generate proprietary code?

User233y ago

That depends. Odds are good some GPL code slipped in somewhere, so using the GPL for the whole thing is an option in that case. And sure you can derive proprietary code from GPL code, so long as you don't publish binaries.

1 more reply

spookie3y ago

I'm not so sure on whether or not it's completely acceptable to train a LLM under GPL, for example. To bring the point home, reverse engineering efforts follow the clean-room design technique. This is done in an effort to not infringe copyrights.

Would love to see this being done on decompiled proprietary code. Training done on it. And released into the wild.

But the amount of data necessary, and computing power to do it might not be available for the common person.

jen203y ago

As you describe, a perfectly acceptable outcome is that licenses are respected with respect to attribution and where necessary propagation to derived works.

jimmaswell3y ago

Reading a source then writing your own code with the same ideas in mind isn't a derived work and shouldn't need attribution. It will be a crying shame if Copilot output is mired in unwarranted legal trouble with how much of a productivity booster it is.

5 more replies

croes3y ago· 6 in thread

Is it still Fair Use if you make money with it?

henryfjordan3y ago

Making money from the work is one of the factors for Fair Use but it is not an automatic-fail kind of situation. A judge/jury would need to hear the facts and consider all the factors.

Here's a good link explaining the Fair Use test: https://copyright.columbia.edu/basics/fair-use.html

croes3y ago

But it's pretty clear from the start that Copilot is neither criticism, comment, news reporting, teaching, scholarship, nor research which are covered by fair use.

2 more replies

nerdponx3y ago

In the USA, yes. For example, the unanimous ruling in Campbell v. Acuff-Rose Music, Inc. determined that parody is fair use, even if the parody is of a commercial nature:

> Held: 2 Live Crew's commercial parody may be a fair use within the meaning of § 107. Pp. 574-594.

The ruling states explicitly that commercial usage can be a determining factor in determining whether usage is fair or not, but that it does not in and of itself make the use "unfair".

https://supreme.justia.com/cases/federal/us/510/569/

pdonis3y ago

Not necessarily. For example, you can make money by writing a review of a book that includes quotes from the book; that is considered fair use. But if you make money by publishing a book that consists solely of quotes from other books that others have copyrighted, on the grounds that this assembly of quotes from other books might be useful to future authors, that would not be fair use.

To me, the latter scenario is much closer to what Github is doing with Copilot, which is one of the things the plaintiffs are alleging violates open source licenses.

klyrs3y ago

I don't think that question is particularly relevant to the case. A newspaper can publish, for profit, a book review which quotes excerpts. As far as I understand it, the case hinges on the distribution of major portions of copyrighted works and derivatives thereof, in violation of their licenses. Likewise, see Aaron Swartz, sci-hub, etc -- distribution of copyrighted works need not be for profit to be a violation.

shagie3y ago

Yes. Prefect 10 vs Google https://cyber.harvard.edu/people/tfisher/IP/2007%20Perfect%2...

> Additionally, the district court determined that the commercial nature of Google's use weighed against its transformative nature. Although Kelly held that the commercial use of the photographer's images by Arriba's search engine was less exploitative than typical commercial use, and thus weighed only slightly against a finding of fair use, the district court here distinguished Kelly on the ground that some website owners in the AdSense program had infringing Perfect 10 images on their websites. The district court held that because Google's thumbnails "lead users to sites that directly benefit Google's bottom line," the AdSense program increased the commercial nature of Google's use of Perfect 10's images.

> In conducting our case-specific analysis of fair use in light of the purposes of copyright, we must weigh Google's superseding and commercial uses of thumbnail images against Google's significant transformative use, as well as the extent to which Google's search engine promotes the purposes of copyright and serves the interests of the public. Although the district court acknowledged the "truism that search engines such as Google Image Search provide great value to the public," the district court did not expressly consider whether this value outweighed the significance of Google's superseding use or the commercial nature of Google's use. The Supreme Court, however, has directed us to be mindful of the extent to which a use promotes the purposes of copyright and serves the interests of the public.

---

I will also draw attention to:

> The fact that Google incorporates the entire Perfect 10 image into the search engine results does not diminish the transformative nature of Google's use. As the district court correctly noted, we determined in Kelly that even making an exact copy of a work may be transformative so long as the copy serves a different function than the original work.

jsnell3y ago· 5 in thread

Is there anything out of the ordinary here? Doesn't basically every lawsuit have the defendant file a motion to dismiss, based on any halfway plausible reason?

nimbius3y ago

it was a successful strategy for VMWare when approached by a German developer about improper licensing for his open source code. VMWare managed to get the original case tossed on a technicality, as well as the appeal, which bought them enough time to drop the linux code entirely and avoid a discovery where they would most certainly be found in violation.

https://www.zdnet.com/article/linux-developer-abandons-vmwar...

https://www.zdnet.com/article/vmware-sued-for-failure-to-com...

https://en.wikipedia.org/wiki/Vmlinux

AlbertCory3y ago

Not only is the answer "no, the defendant always files a motion to dismiss," it's a good strategy because it forces the plaintiff to say something on the record.

TheRealPomax3y ago

There is not. This is standard operating procedure. Getting a case thrown saves so much money that it is entirely worth having your legal team try to make it happen before the real work starts.

klyrs3y ago

To those interested in watching the particulars of this case, this is not a surprising development. But the play-by-play is interesting. Sports announcers manage to talk continuously during a game, and don't sit there silently and say "team A won with 20 points to team B's 5 points, what a game" at the very end. Personally, I don't care for the sportsguy blathering about a game nor the end results, and prefer to read about legal shenanigans.

lostmsu3y ago

Perhaps a meta discussion is needed here regarding the potential ability to dismiss a lawsuit in a scenario like this where everyone understands the existence of a legal problem in the need of future guidance.

MagicMoonlight3y ago· 3 in thread

If open source devs aren’t allowed to use the source code of windows to improve react, why the fuck should microsoft be able to copy and paste other people’s code for profit

EMIRELADERO3y ago

Open source devs are allowed to learn from proprietary source code, including Windows' decompiled binaries, if they so choose. The fact that ReactOS and Wine have chosen to essentially self-sabotage by adopting a "clean-room" blackbox policy does not mean other projects must do so. Those policies are self-inflicted wunds, not mandated by any legal cases or standards.

ipaddr3y ago

How would an open source windows improve react? The tooling?

danzk3y ago

I think they mean ReactOS.

mapme3y ago· 2 in thread

Is there a OSS license that specifically precludes its use in LLMs or effectively does so?

colejohnson663y ago

The thing about fair use is that there’s nothing a license can do to prevent it. After all, that’s the whole point of fair use: to say that there’s valid reasons to use pieces of IP without regards to their licenses.

So, if the courts find in Microsoft and OpenAI’s favor (which remains to be seen despite the many armchair lawyers here), your license would mean jack squat.

Brian_K_White3y ago

They don't aim to. The problem is really just accreditation. If copilot copies a chunk of code for you, chances are the original author was perfectly happy for you to do that, and you put their name somewhere in your credits. Copilot copies the same code, but scrubs the original author. It may also be copying code that was not ok to copy but that's a seperate even worse issue.

simonblack3y ago· 1 in thread

"It violates the licenses that open-source programmers chose and monetizes their code despite GitHub's pledge never to do so."

Microsoft never changes. Always looking for a dishonest buck. Does 'Embrace, Extend, and Extinguish' ring a bell for younger players? Thought not.

thunkshift13y ago

Its not just microsoft, its the developer free loading culture.. once we start paying with our instead of free loading then things will change

atomicUpdate3y ago· 1 in thread

Why does the post title omit OpenAI, so it no longer matches the article’s title?

> OpenAI, Microsoft want court to toss lawsuit accusing them of abusing open-source code

Dylan168073y ago

Character limit.

silverwasthere3y ago· 1 in thread

https://en.m.wikipedia.org/wiki/Licence_laundering

Seems pretty obvious to me but we'll see how it goes in the court.

ThaDoodOP3y ago

Huh, I always had this concept in my mind but never knew it actually had a phrase with some legal precedent.

phendrenad23y ago

I have a feeling that no matter what the outcome is here, it's not going to be satisfying.

One extreme is AI is allowed to spit out copyrighted code verbatim as long as it technically goes through an AI first. Of course that defeats all open-source languages by adding a backdoor around them.

The other extreme is that AI is not allowed to spit out a single line of copyrighted code, in which case we'll have endless lawsuits to figure out if CodeGPT used a GPL-licensed fast inverse square root or if it used the public-domain fast inverse square root.

I think we'll land somewhere in the middle: If an AI regurgitates a "substantial" number of lines of code, then it's creators can be held liable (a.k.a. the "we'll know it when we see it" standard.)

j / k navigate · click thread line to collapse

63 comments

36 comments · 9 top-level

User233y ago· 8 in thread

Breaking news: litigant wants to win lawsuit.

candiddevmike3y ago

kosievdmerwe3y ago

I feel a lot of people get caught up on the output code and completely ignore the fact that copilot itself is likely a massive copyright violation.

2 more replies

randombits03y ago

Hold on, there is a difference between “recreate” and “copy”. Copyright only applies to creative expressions. If the code is trivially “recreated”, it’s not particularly creative.

Copyrighted content can be used without the holder’s permission under “Fair Use”.

Don’t assume all code can be copyrighted. Purely functional expressions are not copyrightable. Code is math.

There’s a lot here to unpack.

2 more replies

wonks3y ago

Wait, are you saying that there is legal precedent for training an LLM with open source code to generate proprietary code?

User233y ago

1 more reply

spookie3y ago

Would love to see this being done on decompiled proprietary code. Training done on it. And released into the wild.

But the amount of data necessary, and computing power to do it might not be available for the common person.

jen203y ago

As you describe, a perfectly acceptable outcome is that licenses are respected with respect to attribution and where necessary propagation to derived works.

jimmaswell3y ago

5 more replies

croes3y ago· 6 in thread

Is it still Fair Use if you make money with it?

henryfjordan3y ago

Making money from the work is one of the factors for Fair Use but it is not an automatic-fail kind of situation. A judge/jury would need to hear the facts and consider all the factors.

Here's a good link explaining the Fair Use test: https://copyright.columbia.edu/basics/fair-use.html

croes3y ago

But it's pretty clear from the start that Copilot is neither criticism, comment, news reporting, teaching, scholarship, nor research which are covered by fair use.

2 more replies

nerdponx3y ago

In the USA, yes. For example, the unanimous ruling in Campbell v. Acuff-Rose Music, Inc. determined that parody is fair use, even if the parody is of a commercial nature:

> Held: 2 Live Crew's commercial parody may be a fair use within the meaning of § 107. Pp. 574-594.

The ruling states explicitly that commercial usage can be a determining factor in determining whether usage is fair or not, but that it does not in and of itself make the use "unfair".

https://supreme.justia.com/cases/federal/us/510/569/

pdonis3y ago

To me, the latter scenario is much closer to what Github is doing with Copilot, which is one of the things the plaintiffs are alleging violates open source licenses.

klyrs3y ago

shagie3y ago

Yes. Prefect 10 vs Google https://cyber.harvard.edu/people/tfisher/IP/2007%20Perfect%2...

---

I will also draw attention to:

jsnell3y ago· 5 in thread

Is there anything out of the ordinary here? Doesn't basically every lawsuit have the defendant file a motion to dismiss, based on any halfway plausible reason?

nimbius3y ago

https://www.zdnet.com/article/linux-developer-abandons-vmwar...

https://www.zdnet.com/article/vmware-sued-for-failure-to-com...

https://en.wikipedia.org/wiki/Vmlinux

AlbertCory3y ago

Not only is the answer "no, the defendant always files a motion to dismiss," it's a good strategy because it forces the plaintiff to say something on the record.

TheRealPomax3y ago

There is not. This is standard operating procedure. Getting a case thrown saves so much money that it is entirely worth having your legal team try to make it happen before the real work starts.

klyrs3y ago

lostmsu3y ago

MagicMoonlight3y ago· 3 in thread

If open source devs aren’t allowed to use the source code of windows to improve react, why the fuck should microsoft be able to copy and paste other people’s code for profit

EMIRELADERO3y ago

ipaddr3y ago

How would an open source windows improve react? The tooling?

danzk3y ago

I think they mean ReactOS.

mapme3y ago· 2 in thread

Is there a OSS license that specifically precludes its use in LLMs or effectively does so?

colejohnson663y ago

So, if the courts find in Microsoft and OpenAI’s favor (which remains to be seen despite the many armchair lawyers here), your license would mean jack squat.

Brian_K_White3y ago

simonblack3y ago· 1 in thread

"It violates the licenses that open-source programmers chose and monetizes their code despite GitHub's pledge never to do so."

Microsoft never changes. Always looking for a dishonest buck. Does 'Embrace, Extend, and Extinguish' ring a bell for younger players? Thought not.

thunkshift13y ago

Its not just microsoft, its the developer free loading culture.. once we start paying with our instead of free loading then things will change

atomicUpdate3y ago· 1 in thread

Why does the post title omit OpenAI, so it no longer matches the article’s title?

> OpenAI, Microsoft want court to toss lawsuit accusing them of abusing open-source code

Dylan168073y ago

Character limit.

silverwasthere3y ago· 1 in thread

https://en.m.wikipedia.org/wiki/Licence_laundering

Seems pretty obvious to me but we'll see how it goes in the court.

ThaDoodOP3y ago

Huh, I always had this concept in my mind but never knew it actually had a phrase with some legal precedent.

phendrenad23y ago

I have a feeling that no matter what the outcome is here, it's not going to be satisfying.

I think we'll land somewhere in the middle: If an AI regurgitates a "substantial" number of lines of code, then it's creators can be held liable (a.k.a. the "we'll know it when we see it" standard.)

j / k navigate · click thread line to collapse