Is legal the same as legitimate: AI reimplementation and the erosion of copyleft (opens in new tab)

(writings.hongminhee.org)

569 pointsdahlia17d ago599 comments

599 comments

> If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides.

Our foreparents fought for the right to implement works-a-like to corporate software packages, even if the so-called owners did not like it. We're ready to throw it all away, and let intellectual property owners get so much more control.

The implications will not end up being anti-large-corporation or pro-sharing. If you can prevent someone from re-implementing a spec or building a client that speaks your API or building a work-a-like, it will be the large corporations that exersize this power as usual.

dogcomplex17d ago

We should be removing IP law entirely, not strengthening it to cover entire classes of problem even when implemented entirely differently. Same for anyone trying to claim "colorful monster creatures" as innately Pokemon IP. Just because someone climbed a mountain first doesn't mean they own it forever. Nobody should be honouring any of these claims.

Nor should we be treating AI models themselves as respected IP. They're built on everyone else's data. Throw away this whole class of law, it's irrelevant in this new world.

6 more replies

thayne17d ago

Yeah, I really don't think we want APIs to be protected by IP. But in this case it isn't just the API, there were also tests involved. I think you could make a pretty strong argument that if you used a test suite to get an agent to implement some code, the code is a derivative product of the test code.

2 more replies

RobRivera17d ago

Sounds very similar to that whole API lawsuit with oracle.

1 more reply

glhaynes16d ago

It'd be interesting (earnestly!) to see someone make a solid case for AI reimplementation being bad but that the original (afaik) "clean room" project, Compaq's reimplementation of IBM's PC BIOS (something most people seem to see as a righteous move toward openness and freedom), was good.

thayne17d ago

What if there was a special exemption for using a specification if you open source (or open hardware) the result for some definition roughly (or exacactly) equivalent to the OSI definition of open source, or FSFs definition of free software?

Although I think the chance of that happening is effectively zero.

noemit16d ago

Copyright has always benefited those with power, down to the very first instance: Albrecht Durer bullying little children who wanted to make inferior copies of his prints so that their familities could enjoy the art. Durer insisted the art was only for nobles. Ab initio

zelphirkalt16d ago

It is not aout throwing the right to implement things away. As long as it is done according to the license of the works modified or copied, one can do that. What this is against is, that people wash away a license, that is meant to keep things open, transparent and free. It enables businesses to go back to completely proprietary systems, which will impact your rights.

I am for keeping the licenses in place, as long as there is any copyright at all on software. If we get rid of that, then we can get rid of copyleft licenses and all others too. But of course businesses and greedy people want to have their cake and eat it too. They want copyleft to disappear, but _their_ software, oh no, no one may copy that! Double standards at their best.

1 more reply

Ferret744616d ago

Our foreparents of FOSS (e.g. RMS) fought to destroy copyright. Copyleft was a subversive mechanism to "neutralize" copyright using the laws of copyright against itself.

Hypothetically, I think this trail of suggestion of treating specs as intellectual property would simply destroy copyright for software, which is what we (the people who believe in FOSS) want. There is already case law protecting specs (e.g. Java)

1 more reply

red_admiral16d ago

The specification of chardet, which started this all off, is essentially forced by the unicode statndard though.

az22616d ago

SCOTUS ruled on this already when Google copied Sun’s Java wholesale.

1 more reply

alterom17d ago

>Our foreparents fought for the right to implement works-a-like to corporate software packages, even if the so-called owners did not like it

Our "foreparents" weren't competing with corporations with unlimited access to generative AI trained on their work. The times, they're-a-changin'.

You're rehashing the argument made in one of the articles which this piece criticizes and directly addresses, while ignoring the entirety of what was written before the conclusion that you quoted.

If anyone finds themselves agreeing with the comment I'm responding to, please, do yourself a favor and read the linked article.

I would do no justice to it by reiterating its points here.

5 more replies

zmmmmm17d ago

The really interesting question to me is if this transcends copyright and unravels the whole concept of intellectual property. Because all of it is premised on an assumption that creativity is "hard". But LLMs are not just writing software, they are rapidly being engineered to operate completely generally as knowledge creation engines: solving math proofs, designing drugs, etc.

So: once it's not "hard" any more, does IP even make sense at all? Why grant monopoly rights to something that required little to no investment in the first place? Even with vestigial IP law - let's say, patents: it just becomes and input parameter that the AI needs to work around the patents like any other constraints.

palmotea17d ago

> So: once it's not "hard" any more, does IP even make sense at all? Why grant monopoly rights to something that required little to no investment in the first place? Even with vestigial IP law - let's say, patents: it just becomes and input parameter that the AI needs to work around the patents like any other constraints.

I think it still does: IIRC, the current legal situation is AI-output does not qualify for IP protections (at least not without substantial later human modification). IP protections are solely reserved for human work.

And I'm fine with that: if a person put in the work, they should have protections so their stuff can't be ripped off for free by all the wealthy major corporations that find some use for it. Otherwise: who cares about the LLMs.

4 more replies

utopiah16d ago

> operate completely generally as knowledge creation engines: solving math proofs, designing drugs, etc.

Any example of that? So far I haven't seen any but maybe I'm looking at the wrong places.

I've see a lot of :

- "solving" math proofs that were properly formalized, with often numerous documented past attempts, re-verified by proper mathematicians, without necessarily any interesting results

- haven't seen any designed trust, most I've seen was (again with entire teams of experts behind) finding slight optimizations

Basically all outputs I've seen so far have been both following existing trends (basically low hanging fruits without any paradigm shift) and never ever alone but rather as search supports for teams of World class experts. None of these that would quality IMHO as knowledge creation. Whenever such results were published the publication seemed mostly to be promotion about the workflow itself more than the actual results. DeepMind seems to be the prime example for that.

PS: for the epistemological distinction you can see a few past comments of mine (e.g. https://news.ycombinator.com/item?id=47011884 )

satvikpendem17d ago

Good. Intellectual property is now a twisted concept by the elite, whatever its benefits were previously. As soon as Disney made Mickey popular, it was all downhill.

godd217d ago

1 more reply

rfw30017d ago

More likely: this is a transitional phase where our previously hard problems become easy, and we will soon set our sights on new and much harder problems. The pinnacle of creative achievement in the universe is probably not 2010s B2B SaaS.

It is entirely possible, however, that human beings will not be the primary drivers of progress on those problems.

1 more reply

treyd17d ago

> if this transcends copyright and unravels the whole concept of intellectual property.

I have been saying this for years. Intellectual property is based on the concept that ideas can be owned, which is fundamentally a contradiction with how reality operates. We've been able to write laws that paper over that contradiction by introducing concepts like "fair use", but it doesn't resolve it.

AI is just making the conflict arising out of that contradiction more intense in new ways and forcing us to reckon with it in this new technological landscape. You can follow two perfectly reasonable lines of logic and end up with contradictory solutions. So how are we going to get out of this mess? I don't know, not without rolling back (at least parts of) what intellectual property is in the first place.

nradov17d ago

Nothing changes for drug patents regardless of whether an LLM was used in the discovery process.

2 more replies

kindkang202417d ago

At some level, IP makes sense — creators should be rewarded. But IP only benefits those who claim it. The benefits rarely flow back to humanity who made it all possible. Every LLM was trained on humanity's collective knowledge. The value was created collectively, then captured privately.

That's the reason I like the idea of DUKI/dju:ki/ — Decentralized Universal Kindness Income, similar to UBI but driven by voluntary kindness and sincere marketing rather than taxation. If AI makes creation trivially easy and IP loses its justification, the question becomes: how do we ensure a tiny part of the wealth generated flows back to everyone?

LelouBil17d ago

This is similar :

https://www.vice.com/en/article/musicians-algorithmically-ge...

Two musicians generated every possible melody within an octave, and published them as creative Commons Zero.

I never heard about this again though.

Eridrus17d ago

The point of IP is to encourage the creation of new things.

Not all protections have to be ones that give total control like copyright.

I think it's a mistaken assumption that costs will fall to zero. The low hanging fruit will get picked, and then we'll be doing expensive combined AI/wetlab search for new drugs.

If there is any meaningful headroom we will keep doing expensive things to make progress.

1 more reply

prohobo16d ago

From what I understand, LLMs can't really generate anything meaningful that doesn't implicitly rely on the operator's choices. It's hard to make the right novel choices as soon as you leave well-defined problem spaces.

In terms of math and biochemistry the cost of generating candidates has collapsed, but the cost of validating them hasn't.

js817d ago

It might unravel intellectual property, just not in a fair way. When capitalism started, public land was enclosed to create private property. Despite this being in many cases a quite unfair process, we still respect this arrangement.

With AI, a similar process is happening - publicly available information becomes enclosed by the model owners. We will probably get a "vestigial" intellectual property in the form of model ownership, and everyone will pay a rent to use it. In fact, companies might start to gatekeep all the information to only their own LLM flavor, which you will be required to use to get to the information. For example, product documentation and datasheets will be only available by talking to their AI.

gnopgnip17d ago

Also copyright can protect something normally not eligible when the author chooses what information to include and exclude

AlienRobot17d ago

The basis of your argument is that AI-generated work isn't hard, but your conclusion is that ALL work, AI-generated or not, should lose IP rights?

eru17d ago

There's different kinds of intellectual property.

1 more reply

newyankee17d ago

If you think about creative outcomes as n dimensional 'volumes', AI expressions can cover more than humans in many domains. These are precisely artistic styles, music styles etc. and tbh not everyone can be a Mozart but may be a lot more with AI can be Mozart lite. This begs the question how much of creativity is appreciated as a shared experience

matheusmoreira17d ago

Intellectual property never made any sense to begin with. It is logically reducible to ownership of numbers. It is that absurd. Computers made the entire concept irrelevant the second they were invented but they kept holding on via lobbying power. Maybe AI will finally put the final nail on the coffin of intellectual property.

Sure, it's disgusting and hypocritical how these corporations enshrined all this nonsense into law only to then ignore it all the second LLMs were invented. It's ultimately a good thing though. The model weights are all that matters. All we need to do is wait for the models to hit diminishing returns, then somehow find a way to leak them so that everyone has access. If they refuse, then just force them. By law or by revolution.

paxys17d ago

"Hard" or "easy" has never been part of the premise.

A company spends a decade and billions of dollars to develop a groundbreaking drug and patents it.

I think of a cool new character called "Mr Poop" and publish a short story about him with an hour of work.

Both of us get the exact same protection under the law (yes yes I know copyright vs patent etc., but ultimately they are all about IP protection).

keeda17d ago

Creativity is still hard. AI-generated content is called "slop" for a reason ;-)

hyperman117d ago

I've always thought the opposite: IP law was created to make sure creativity stays hard, and hence controllable by the elites.

Patents came along when farmers started making city goods, threatening guilds secrets. Copyright came when the printing press made copying and translating the bible easy and accessible to all. (Trademark admittedly does not fit this view, but doesn't seem all that damaging either)

To Protect The Arts, and To Time Limit Trade Secrets were just the Protect The Children of old times, a way to confuse people who didn't look too hard at actual consequences.

This means that the future of IP depends on what lets the powers that be pull up the ladder behind them. Long term I'd expect e.g. copyright expansion and harder enforcement, just because cloning by AI gets easy enough to threaten the status quo.

2 more replies

spwa417d ago

Don't worry. The courts have consistently sided with huge companies on copyright. In the US. In Europe. Doesn't matter.

Company incorporates GPL code in their product? Never once have courts decided to uphold copyright. HP did that many times. Microsoft got caught doing it. And yet the GPL was never applied to their products. Every time there was an excuse. An inconsistent excuse.

Schoolkid downloads a movie? 30,000 USD per infraction PLUS armed police officer goes in and enforces removal of any movies.

Or take the very subject here. AI training WAS NOT considered fair use when OpenAI violated copyright to train. Same with Anthropic, Google, Microsoft, ... They incorporated harry potter and the linux kernel in ChatGPT, in the model itself. Undeniable. Literally. So even if you accept that it's changed now, OpenAI should still be forced to redistribute the training set, code, and everything needed to run the model for everything they did up to 2020. Needless to say ... courts refused to apply that.

So just apply "the law", right. Courts' judgement of using AI to "remove GPL"? Approved. Using AI to "make the next Disney-style movie"? SEND IN THE ARMY! Whether one or the other violates the law according to rational people? Whatever excuse to avoid that discussion is good enough.

ordu17d ago

I believe it is a narrow view of the situation. If we take a look into the history, into the reasons for inventing GPL, we'll see that it was an attempt to fight copyrights with copyrights. The very name 'copyleft' is trying to convey the idea.

What AI are eroding is copyright. You can re-implement not just a GPL program, but to reverse engineer and re-implement a closed source program too, people have demonstrated it already, there were stories here on HN about it.

AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.

davidw17d ago

> LLM as the main weapon

LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source where you could do innovative work on your own computer running Linux or FreeBSD or some other open OS.

I don't think that's an exciting idea for the Free Software Foundation.

Perhaps with time we'll be able to run local ones that are 'good enough', but we're not there yet.

There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.

Edit: I guess the conclusion I come to is that LLM's are good for 'getting things done', but the context in which they are operating is one where the balance of power is heavily tilted towards capital, and open source is perhaps less interesting to participate in if the machines are just going to slurp it up and people don't have to respect the license or even acknowledge your work.

8 more replies

stebalien17d ago

Copyleft is a mirror of copyright, not a way to fight copyright. It grants rights to the consumer where copyright grants rights to the creator. Importantly, it gives the end-user the right to modify the software running on their devices.

Unfortunately, there are cases where you simply can't just "re-implement" something. E.g., because doing so requires access to restricted tools, keys, or proprietary specifications.

2 more replies

cubefox17d ago

That's naive. Copyright doesn't just apply to software. There already have been countless lawsuits about copying music long before the term "open source" was invented. No, changing the lyrics a bit doesn't circumvent copyright. Nor does translating a Stephen King novel to German and switching the names of the places and characters.

A court ordered the first Nosferatu movie to be destroyed because it had too many similarities to Dracula. Despite the fact that the movie makes rather large deviations from the original.

If Claude was indeed asked to reimplement the existing codebase, just in Rust and a bit optimized, that could well be a copyright violation. Just like rephrasing A Song ot Ice and Fire a bit, and switching to a different language, doesn't remove its copyright.

2 more replies

johnofthesea17d ago

> AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.

Is this LLM thing freely available or is it owned and controlled by these companies? Are we going to rent the tools to fight "evil software corporations"?

3 more replies

Peritract17d ago

> chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.

LLMs are one of the primary manifestations of 'evil software corporations' currently.

dathinab17d ago

> we'll see that it was an attempt to fight copyrights with copyrights

it's not that simple

yes, GPLs origins have the idea of "everyone should be able to use"

but it also is about attribution the original author

and making sure people can't just de-facto "size public goods"

the kind of AI usage is removing attribution and is often sizing public goods in a way far worse then most companies which just ignored the license did

so today there is more need then ever in the last few decades for GPL like licenses

1 more reply

webstrand17d ago

Its purpose "if you run the software you should be able to inspect and modify that software, and to share those modifications with your peers" not explicitly resist copyright. Yes copyright is bad in that it often prevents one from doing that, but it is not the purpose of the GPL to dismantle copyright.

Reducing it to "well you can clone the proprietary software you're forced to use by LLM" is really missing the soul of the GPL.

1 more reply

paxys17d ago

Until there is a capable open source open weight AI that is easily hostable by an average person - no, we still have a long way to go. You aren't going to have software freedom when the tool that enables it is controlled by a handful of powerful tech companies.

thomastjeffery17d ago

While I personally agree with you, Richard Stallman (the creator of the GPL) does not. He has always advocated in favor of strong copyright protection, because the foundation of the GPL is the monopoly power granted by copyright. The problem that the GPL is intended to solve is proprietary software.

Generative models (AI) are not really eroding copyright. They are calling its bluff. The very notion of intellectual property depends on a property line: some arbitrary boundary where the property begins and ends. Generative models blur that line, making it impractical to distinguish which property belongs to whom.

Ironically, these models are made by giant monopolistic corporations whose wealth is quite literally a market valuation (stock price) of their copyrights! If generative models ever become good enough to reimplement CUDA, what value will NVIDIA have left?

The reality is that generative models are nowhere near good enough to actually call the bluff. Copyright is still the winning hand, and that is likely to continue, particularly while IP holders are the primary authors of law.

---

This whole situation is missing the forest for the trees. Intellectual Property is bullshit. A system predicated on monopoly power can only result in consolidated wealth driving the consolidation of power; which is precisely what has happened. The words "starving artist" ring every bit as familiar today as any time in history. Copyright has utterly failed the very goals it was explicitly written with.

It isn't the GPL that needs changing. So long as a system of copyright rules the land, copyleft is the best way to participate. What we really need is a cohesive political movement against monopoly power; one that isn't conveniently ignorant of copyright as its most significant source.

1 more reply

mikkupikku17d ago

I agree with almost all of that, except the part about GNU changing their stance. I think GNU should stay true and consistent, if for no other reason than to not make many of their supporters who aren't on board with AI feel betrayed and have GNUs legacy soured. If the cause of LLMs conquering proprietary software needs an organization to champion it, let that be a new organization, not GNU.

re-thc17d ago

> What AI are eroding is copyright.

At the moment it's people that are eroding copyright. E.g. in this case someone did something.

"AI" didn't have a brain, woke up and suddenly decided to do it.

Realistically nothing to do with AI. Having a gun doesn't mean you randomly shoot.

xantronix17d ago

So not only are we moving goalposts here, but we've decided the GNU team should join the other team? I don't understand how GNU would see mass model LLM training as anything but the most flagrant violations of their ethos. LLM labs, in their view, would be among the most evil software corporations to have ever existed.

wolvesechoes17d ago

> AI is eroding copyright

Unless it is IP of the same big corpos that consumed all content available. Good luck with eroding them.

martin-t17d ago

This is naive. Advertisement and network effects win. Individuals cannot compete with corporations on equal ground here.

sharkjacobs17d ago

> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch

This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"

Blanchard is, of course, familiar with the source code, he's been its maintainer for years. The premise is that he prompted Claude to reimplement it, without using his own knowledge of it to direct or steer.

dathinab17d ago

> Blanchard is, of course, familiar with the source code, he's been its maintainer for years.

I would argue it's irrelevant if they looked or didn't look at the code. As well as weather he was or wasn't familiar with it.

What matters is, that they feed to original code into a tool which they setup to make a copy of it. How that tool works doesn't really matter. Neither does it make a difference if you obfuscate that it's an copy.

If I blindfold myself when making copies of books with a book scanner + printer I'm still engaging in copyright infringement.

If AI is a tool, that should hold.

If it isn't "just" a tool, then it did engage in copyright infringement (as it created the new output side by side with the original) in the same way an employee might do so on command of their boss. Which still makes the boss/company liable for copyright infringement and in general just because you weren't the one who created an infringing product doesn't mean you aren't more or less as liable of distributing it, as if you had done so.

3 more replies

logicprog17d ago

I just don't see how it's relevant whether he did look or didn't. In my opinion, it's not just legally valid to make a re-implementation of something if you've seen the code as long as it doesn't copy expressive elements. I think it's also ethically fine as well to use source code as a reference for re-implementing something as long as it doesn't turn into an exact translation.

3 more replies

axus17d ago

Oracle had it's day in court with Google over the Java APIs. Reimplementing APIs can be done without copyright infringement, but Oracle must have tried to find real infringement during discovery.

In this case, we could theoretically prove that the new chardet is a clean reimplementation. Blanchard can provide all of the prompts necessary to re-implement again, and for the cost of the tokens anyone can reproduce the results.

Aurornis17d ago

Can anyone find the actual quote where Blanchard said this?

My understanding was that his claim was that Claude was not looking at the existing source code while writing it.

3 more replies

SpicyLemonZest17d ago

Isn't this a red herring? An API definition is fair use under Google v. Oracle, but the test suite is definitely copyrightable code!

NewsaHackO17d ago

>This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"

IANAL, but that analogy wouldn't work because Mickey Mouse is a trademark, so it doesn't matter how it is created.

esafak17d ago

If you only stick to the API and ignore the implementation, it is not Mickey Mouse any more but a rodent. If it was just a clone it wouldn't be 50x as fast. Nevertheless, APIs apparently can be copyrightable. I generally disagree with this; it's how PC compatibles took off, giving consumers better options.

2 more replies

re-thc17d ago

> This feels sort of like saying "I just blindly threw paint at that canvas on the wall and

> He fed only the API and the test suite to Claude and asked it

Difference being Claude looked; so not blind. The equivalent is more like I blindly took a photo of it and then used that to...

Technically did look.

1 more reply

babypuncher17d ago

What if we said that generative AI output is simply not copyrightable. Anything an AI spits out would automatically be public domain, except in cases where the output directly infringes the rights of an existing work.

This would make it so relicensing with AI rewrites is essentially impossible unless your goal is to transition the work to be truly public domain.

I think this also helps somewhat with the ethical quandary of these models being trained on public data while contributing nothing of value back to the public, and disincentivize the production of slop for profit.

2 more replies

Gigachad17d ago

Someone should put this to the test. Take the recently leaked Minecraft source code and have Copilot build an exact replica in another programming language and then publish it as open source. See if Microsoft believes AI is copyright infringement or not.

robmccoll17d ago

As described, this would not be the same thing. If the AI is looking at the source and effectively porting it, that is likely infringement. The idea instead should be "implement Minecraft from scratch" but with behavior, graphics, etc. identical. Note that you'll need to have an AI generate assets or something since you can't just reuse textures and models.

4 more replies

u1hcw9nx17d ago

This was not about legality.

> That question is this: does legal mean legitimate?

Just because something is legal does not mean it's moral thing to do.

1 more reply

Aboutplants17d ago

I’ve often thought that the key to fighting this is through this exact method. Turn the tool against them

martin-t17d ago

They might not care. Products win not by quality or features but by advertisement, hype and network effects.

The original implementation would still have the upper hand here. OTOH if I as a nobody create something cool, there's nothing stopping a huge corporation from "reimplementing" (=stealing) it and and using their huge advertising budget to completely overshadow me.

And that's how they like it.

1 more reply

peacebeard17d ago

The big question is: if copyrighted material was used in the training material, is the LLM's output copyright infringement when it resembles the training material? In your example, you are taking the copyrighted material and giving it to the LLM as input and instructing the LLM to process it. Regardless of where the legal cards fall, this is a much less ambiguous scenario.

2 more replies

GuB-4217d ago

I think it will become interesting when AI will be able to decompile binaries.

1 more reply

amelius17d ago

You will probably run into design patents.

1 more reply

fruitworks17d ago

this is the question of the hour. Imagine using this LLM proxy to license-strip major parts of leaked Windows source code to produce code for WINE.

On top of all of this, there are the attempts at binary decompilation using LLMs and other new tools that have been discussed on this site recently.

kelseyfrog17d ago

In the corporate world, we've started using reimplementation as a way to access tooling that security won't authorize.

Sec has a deny by default policy. Eng has a use-more-AI policy. Any code written in-house is accepted by default. You can see where this is going.

We've been using AI to reimplement tooling that security won't approve. The incentives conspired in the worst outcome, yet here we are. If you want a different outcome, you need to create different incentives.

kemitchell17d ago

Not Invented Here's long, slow mutagenic march toward full antibiotic resistance continues apace.

There is a fundamental corpo-cognitive dissonance, to boot. If "AI" is cheap enough and good enough to implement security-relevant software from `git init` repeatedly, why isn't it also cheap enough and good enough to assess and approve the security of third-party software at pace with internal adoption? Is there some basis to believe LLMs' leverage on production differs from its leverage on analysis of existing code?

munk-a17d ago

I think the missing thing here is that the license violation already happened. Most of the big models trained on data in a manner that violated terms of service. We'll need a court case but I think it's extremely reasonable to consider any model trained on GPL code to be infected with open licensing requirements.

crazygringo17d ago

You might wish that were true, but there are very strong arguments it's not. Training on copyleft licensed code is not a license violation. Any more than a person reading it is. In copyright terms, it's such an extreme transformative use that copyright no longer applies. It's fair use.

But agreed that we're waiting for a court case to confirm that. Although really, the main questions for any court cases are not going to be around the principle of fair use itself or whether training is transformative enough (it obviously is), but rather on the specifics:

1) Was any copyrighted material acquired legally (not applicable here), and

2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)

And in this particular case, they confirmed that the new implementation is 98.7% unique.

5 more replies

NewsaHackO17d ago

I agree there has to be a court case about it. I think the current argument, however, is that it is transformative, and therefore falls under fair use.

2 more replies

paxys17d ago

The act of training by itself has been ruled to be fair use over and over again, including for LLMs, and there isn't much debate left there.

The test for infringement is if the output is transformative enough, and that is what NYT vs OpenAI etc. are arguing.

steve_gh16d ago

Is the LLM acting as my agent? If the LLM has been exposed to the source code then have I been exposed to the source code? So in that case is a "clean room" implementation possible?

PaulDavisThe1st17d ago

If Blanchard is claiming not to have been substantively involved in the creation of the new implementation of chardet (i.e. "Claude did it"), then the new implementation is machine generated, and in the USA cannot be copyright and thus cannot be licensed.

If he is claiming to have been somehow substantively "enough" involved to make the code copyrightable, then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.

sigmar17d ago

>then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.

The "clean room rewrite" is just an extreme way to have a bulletproof shield against litigation. Not doing it that way doesn't automatically make all new code he writes derivative solely because he saw how the code worked previously.

1 more reply

largbae17d ago

This is only worth arguing about because software has value. Putting this in context of a world where the cost of writing code is trending to 0, there are two obvious futures:

1. The cost continues to trend to 0, and _all_ software loses value and becomes immediately replaceable. In this world, proprietary, copyleft and permissive licenses do not matter, as I can simply have my AI reimplement whatever I want and not distribute it at all.

2. The coding cost reduction is all some temporary mirage, to be ended soon by drying VC money/rising inference costs, regulatory barriers, etc. In that world we should be reimplementing everything we can as copyleft while the inferencing is good.

sarchertech17d ago

There’s an other option. The cost of copying existing software trends to 0, but the cost of writing new software stays far enough above 0 that it is still relatively expensive.

beepbooptheory17d ago

There will always be cost though. Even if perfect code is getting one-shotted out, that is constantly maintained and adapted to changing conditions and technology, it simply can't stay at 0 forever because one day the power is surely going to go out!

More and more I am drawn to these kinds of ideas lately, perhaps as a kind of ethical sidestep, but still:

- https://wiki.xxiivv.com/site/permacomputing.html

- https://permacomputing.net/

It's not going to solve any general issue here, but the one thing these freaks need that can't be generated by their models is energy, tons of it. So, the one thing I can do as an individual and in my (digital) community is work to be, in a word, self-sustainable. And depending on my company I guess, if I was a CEO I would hope I was wise enough to be thinking on the same lines.

Everyone is making beautiful mountains from paper and wire. I will just be happy to make a small dollhouse of stone, I think it will be worth it. How can we see not just at least some small-level of hubris otherwise?

anonymous_sorry17d ago

There was a recent ruling that LLM output is inherently public domain (presumably unless it infringes some existing copyright). In which case it's not possible to use them to "reimplement everything we can as copyleft".

2 more replies

casey217d ago

The value of software has never been tied to the cost of writing it, even if you don't distribute it your still breaking the law.

1 more reply

rstuart413316d ago

I came here to say a similar thing.

There would be no GPL if anybody could have cheaply and trivially reproduced the software for printers and Lisp machines Stallman was denied access to. There is no reason to force someone to give you the source code if takes no effort to reproduce.

Mind you, that isn't what happened here. The effort involved in getting a LLM to write software comes from three things: writing a clear unambiguous spec that also gives you a clean exported API, more clean unambiguous specs for the APIs you use, and a test suite the LLM can use to verify it has implemented the exported API correctly. Dan got them all for free, from the previous implementation which I'm sure included good documentation. That means his contribution to this new code consisted of little more than pressing the button.

Sadly, if you wrote some GPL software with excellent documentation, a thorough test suite, clean API, and implemented using well understood library the cost of creating a cleanroom reproduction has indeed gone to near zero over the past 24 months. The GPL licence is irrelevant.

Welcome to the brave new world.

PS: Sqlite keeping their test suite proprietary is looking like a prescient masterstroke.

PPS: The recent ruling that an API isn't copyrightable just took on a whole new dimension.

foresto17d ago

From the article:

> He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch.

From GPL2:

> The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable.

Is a project's test suite not considered part of its source code? When I make modifications to a project, its test cases are very much a part of that process.

If the test suite is part of this library's source code, and Claude was fed the test suite or interface definition files, is the output not considered a work based on the library under the terms of LGPL 2.1?

crazygringo17d ago

It's transformative, so no.

Legally, using the tests to help create the reimplementation is fine.

However, it seems possible you can't redistribute the same tests under the MIT license. So the reimplementation MIT distribution could need to be source code only, not source code plus tests. Or, the tests can be distributed in parallel but still under LGPL, not MIT. It doesn't really matter since compiled software won't be including the tests anyways.

1 more reply

tty45617d ago

Google v. Oracle ruled that use of APIs are fair game and could be argued that test cases are strictly a use of APIs and not implementation.

1 more reply

justinclift17d ago

> The dispute drew responses from two prominent figures in the open source world.

Sure, but neither of those is an IP Lawyer.

The actual IP Lawyer who turned up and tried to engage, Richard Fontana, had his issue closed:

https://github.com/chardet/chardet/issues/334

Richard's point was this (quoted below):

---

FWIW, that case is not really relevant to what we are/were talking about here.

The question is whether you are truly an "author", or whether there was no (human) author.

The general legal consensus has been that generative AI output is not copyrightable (without some special facts of some sort, perhaps).

> If all of this code was somehow not copyrightable because someone wrote a prompt instead of directly editing the code, that would have pretty huge implications.

That's exactly it. Your act of applying the MIT license with your copyright notice to code that you did not "directly edit" has enormous implications.

RcouF1uZ4gsC17d ago

I don't think Fontana's reasoning holds up.

I think it is more like photography.

The case law is that a camera can't own a copyright, but a human can, even though all the pixels were produced by the camera with very little involvement at the pixel level by the human.

2 more replies

ticulatedspline17d ago

Surprised they don't mention Google LLC v. Oracle America, Inc. Seems a bit myopic to condone the general legality while arguing "you can only use it how I like it".

It also doesn't talk about the far more interesting philosophical queston. Does what Blanchard did cover ALL implementations from Claude? What if anyone did exactly what he did, feed it the test cases and say "re-implement from scratch", ostensibly one would expect the results to be largely similar (technically under the right conditions deterministically similar)

could you then fork the project under your own name and a commercial license? when you use an LLM like this, to basically do what anyone else could ask it to do how do you attach any license to it? Is it first come first serve?

If an agent is acting mostly on its own it feels like if you found a copy of Harry Potter in the fictional library of Babel, you didn't write it, just found it amongst the infinite library, but if you found it first could you block everyone else that stumbles on a near-identical copy elsewhere in the library? or does each found copy represent a "Re-implementation" that could be individually copyrighted?

drnick117d ago

It should be noted that the Rust community is also guilty of something similar. That is, porting old GPL programs, typically written in C, to Rust and relicensing them as MIT.

wolvesechoes17d ago

> porting old GPL programs, typically written in C, to Rust and relicensing them as MIT

Everything for memory safety.

phendrenad217d ago

And BSD drivers are "clean-room reimplementations" of the GPL drivers from Linux (but we all know they aren't).

bananamogul17d ago

...and the main distros are enthusiastically adopting them.

Within a relatively short time frame, expect everything in your Linux distro other than the kernel to be MIT-licensed because everything that is FSF-maintained will be rewritten in Rust with the MIT license.

The kernel will then be next, though it'll take a longer timeframe.

The GPL just didn't win in the marketplace of ideas.

1 more reply

lukev17d ago

I agree with the thrust of this article, that norms and what we perceive as good or desirable extend considerably beyond the minimum established by law.

But a point that was not made strongly, which highlights this even more, is that this goes in every direction.

If this kind of reimplementation is legal, then I can take any permissive OSS and rebuild it as proprietary. I can take any proprietary software and rebuild it as permissive. I can take any proprietary software and rebuild it as my own proprietary software.

Either the law needs to catch up and prevent this kind of behavior, or we're going to enter an effectively post-copyright world with respect to software. Which ISN'T GOOD, because that will disincentivize any sort of open license at all, and companies will start protecting/obfuscating their APIs like trade secrets.

integralid17d ago

It goes in one direction only.

Companies can take open-source software and make a proprietary reimplementation. You can't take a proprietary software and make an open source GPL version.

I am absolutely certain that if you tried you would be sued to oblivion. But big company screwing up open source is not even news anymore. In fact I (still) believe that the fact that even though LLMs were trained on tons of GPL and AGPL or even unlicensed software it's considered ok to use LLM code in proprietary projects is example of just that.

1 more reply

martin-t17d ago

I've been thinking this for over two years, that's why I stopped contributing to open source at that time - my work was only gonna be exploited to make rich people richer regardless of the license.

Crazy that only now we're seeing a bunch of articles coming to the same conclusion now.

I think copyright should still apply, but if it doesn't, we need new laws - ones which protect all human work, creative or not. Laws should serve and protect people, not algorithms and not corporations "owning" those algorithms.

I put owning in quotes because ownership should go to the people who did the work.

Buying/selling ownership of both companies and people's work should be illegal just like buying/selling whole humans is. Even if it took thousands of years to get here.

Money should not buy certain things because this is the root cause of inequality. Rich people are not getting richer at a faster rate by being more productive than everyone else but by "owning" other people's work and using it as leverage to extract even more from others.

Maybe LLM and mass unemployment of white collar workers will be the wakeup call needed for a reform. Or revolution.

Last time this happened was during the second industrial revolution and that's how communism got popular. We should do better this time because this is the last revolution which might be possible.

wccrawford17d ago

"Antirez closes his careful legal analysis as though it settles the matter. Ronacher acknowledges that “there is an obvious moral question here, but that isn't necessarily what I'm interested in.” Both pieces treat legal permissibility as a proxy for social legitimacy. "

This whole article is just complaining that other people didn't have the discussion he wanted.

Ronacher even acknowledged that it's a different discussion, and not one they were trying to have at the moment.

If you want to have it, have it. Don't blast others for not having it for you.

wizzwizz417d ago

Having this discussion involves blasting others for not considering it. Consider the rest of the paragraph you quoted:

> But law only says what conduct it will not prevent—it does not certify that conduct as right. Aggressive tax minimization that never crosses into illegality may still be widely regarded as antisocial. A pharmaceutical company that legally acquires a patent on a long-generic drug and raises the price a hundredfold has not done something legal and therefore fine. Legality is a necessary condition; it is not a sufficient one.

1 more reply

rob7416d ago

> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch. The resulting code shares less than 1.3% similarity with any prior version, as measured by JPlag. His conclusion: this is an independent new work, and he is under no obligation to carry forward the LGPL. Mark Pilgrim, the library's original author, opened a GitHub issue to object. The LGPL requires that modifications be distributed under the same license, and a reimplementation produced with ample exposure to the original codebase cannot, in Pilgrim's view, pass as a clean-room effort.

Another question which as far as I can see isn't addressed in the article: even if you accept that the AI-driven reimplementation is an independent new work, can you (even as a maintainer) simply "hijack" the old LGPL-licensed project and overwrite it (if the new code is 98,7% different from the existing code, it's essentially overwriting) with your MIT-licensed code? You're free to start a new MIT-licensed project with your reimplementation, but putting the new code into the old project like some kind of cuckoo's egg seems wrong to me...

miggol17d ago

Wow, it feels like this argument rewired my brain.

When I first read about the chardet situation, I was conflicted but largely sided on the legal permissibility side of things. Uncomfortably I couldn't really fault the vibers; I guess I'm just liberal at heart.

The argument from the commons has really invoked my belief in the inherent morality of a public good. Something being "impermissible" sounds bad until you realize that otherwise the arrow of public knowledge suddenly points backwards.

Seeing this example play out in real life has had retroactive effects on my previously BSD-aligned brain. Even though the argument itself may have been presented before, I now understand the morals that a GPL license text underpins better.

crdrost17d ago

FWIW I like to explain it to folks like this: ignore all of your moral baggage around licensing and just focus on the fact that licensing is a legal tool of art that pretty much only becomes relevant in the context of threatening lawsuits.

BSD-type stuff is very simple because it says "here is this stuff. you can use it as long as you promise not to sue me. I promise not to sue you too."

Very simple.

GPL-type stuff is intrinsically more complex because it's trying to use the threatening power of lawsuits, to reduce overall IP lawsuits. So it has to say "Here is this stuff. You can use it as long as you promise not to sue me. I am only going to sue you, if you start pretending like you have the right to sue other folks over this stuff or anything you derive from it. You don't have the right to sue others for it, I made it, so please stop pretending and let's stop suing each other over this sort of thing."

Getting the entire legal nuance around that sort of counterfactual "I will only sue you if you try to pretend that you can sue others" is why they're more complex. And the simplest copyleft licenses like the Mozilla Public License have a very rigid notion of what "the software" is, so like for MPL it's "this file is gonna never be used in a lawsuit, you can edit it ONLY as long as you agree that this file must never be used by you to sue someone else, if you try to mutate it in a way that lets you sue someone else then that's against our agreement and we reserve the right to sue you."

Whereas for GPL it's actually kind of nebulous what "the software" is -- everything that feeds into the eventual compiled binary, basically -- and so the license itself needs to be a little bit airy-fairy, "let's first talk about what conveying the software means...", in various ways.

The interesting thing here is that as far as the courts are initially ruling, these from-scratch reimplementations are not human works and therefore are not copyrightable, which makes them all kind of public domain. Slapping the MIT license on it was an overstep. If that's how things go then Free Software has actually won its greatest sweep with LLM ubiquity.

1 more reply

ineedasername17d ago

This article is setting up a bit of a moving target. Legal vs legitimate is at least only a single vague question to be defined but then the target changes to “socially legitimate” defined only indirectly by way of example, like aggressive tax avoidance as “antisocial”— and while I tend to agree with that characterization my agreement is predicated on a layering of other principals.

The fundamental problem is that once you take something outside the realm of law and rule of law in its many facets as the legitimizing principal, you have to go a whole lot further to be coherent and consistent.

You can’t just leave things floating in a few ambiguous things you don’t like and feel “off” to you in some way- not if you’re trying to bring some clarity to your own thoughts, much less others. You don’t have to land on a conclusion either. By all means chew over things, but once you try to settle, things fall apart if you haven’t done the harder work of replacing the framework of law with that of another conceptual structure.

You need to at least be asking “to what ends? What purpose is served by the rule?” Otherwise you’re stuck in things where half the time you end up arguing backwards in ways that put purpose serving rules, the maintenance of the rule with justifications ever further afield pulled in when the rule is questioned and edge cases reached. If you’re asking, essentially, “is the spirit of the rule still there?” You’ve got to stop and fill in what that spirit is or you or people that want to control you or have an agenda will sweep in with their own language and fill the void to their own ends.

skybrian17d ago

Broadly speaking, the “freedom of users” is often protected by competition from competing alternatives. The GNU command line tools were replacements for system utilities. Linux was was a replacement for other Unix kernels. People chose to install them instead of proprietary alternatives. Was it due to ideology or lower cost or more features? All of the above. Different users have different motivations.

Copyleft could be seen as an attempt to give Free Software an edge in this competition for users, to counter the increased resources that proprietary systems can often draw on. I think success has been mixed. Sure, Linux won on the server. Open source won for libraries downloaded by language-specific package managers. But there’s a long tail of GPL apps that are not really all that appealing, compared to all the proprietary apps available from app stores.

But if reimplementing software is easy, there’s just going to be a lot more competition from both proprietary and open source software. Software that you can download for free that has better features and is more user-friendly is going to have an advantage.

With coding agents, it’s likely that you’ll be able to modify apps to your own needs more easily, too. Perhaps plugin systems and an AI that can write plugins for you will become the norm?

jacquesm17d ago

> Was it due to ideology or lower cost or more features?

It was due to access.

AndriyKunitsyn17d ago

There's a Japanese version of that page, written in classical text writing direction, in columns. Which is cool. Makes me wonder, though - how readable is it with so many English loanwords which should be rotated sideways to fit into columns?

ddellacosta17d ago

Total digression but yeah, that layout is stupid and the way those words are dropped in using Romaji makes no sense. That's not how Japanese people lay out pages on the web. In fact I don't think I've ever seen a Japanese web page laid out like a book like this, and in general I'd expect the English proper nouns and words that don't have obvious translations to get transliterated into Katakana. Smells like automatic conversion added by someone not really familiar with common practices for presenting Japanese on the web.

1 more reply

kazinator17d ago

You can't put a copyright and MIT license on something you generated with AI. It is derived from the work of many unknown, uncredited authors.

Think about it; the license says that copies of the work must be reproduced with the copyright notice and licensing clauses intact. Why would anyone obey that, knowing it came from AI?

Countless instances of such licenses were ignored in the training data.

harshreality17d ago

When learning is sufficiently atomized and recombined, creations cease to be "derived from" in a legal sense.

A lego sculpture is copyrighted. Lego blocks are not. The threshold between blocks and sculpture is not well-defined, but if an AI isn't prompted specifically to attempt to mimic an existing work, its output will be safely on the non-copyrighted side of things.

A derivative work is separately copyrightable, but redistribution needs permission from the original author too. Since that usually won't be granted or would be uneconomical, the derivative work can't usually be redistributed.

AI-produced material is inherently not copyrightable, but not because it's a derivative work.

1 more reply

moralestapia17d ago

Courts have already ruled that AI-generated work belongs to the public domain. So, even the MIT license does not apply.

effank17d ago

My view is that the current discourse surrounding AI reimplementation is trapped in an antiquated, atomistic model of authorship. What is fundamentally lacking in this debate is a systemic framework for trust, transparency, and the effective traceability of value creation.

Our legal and ethical frameworks including both copyleft and permissive licenses operate under the illusion of discrete, bounded attribution. They assume we can draw a clean perimeter around 'the code' and its 'author.' In reality, software production is a highly complex socio-technical network characterized by deep epistemic opacity. We are arguing over who holds the title to the final output while completely ignoring the vast, distributed network of inputs that made it possible.

Furthermore, because end-users face massive transaction costs and a general lack of incentive to evaluate the granular utility of their consumption, we have no reliable market mechanism to signal value back up the supply chain. Consequently, we fail to effectively compensate the true chain of biological and artificial contributors that facilitate downstream consumption.

In a rigorously mapped value-system, attribution would not stop at the keyboard; it would extend to all nodes of enablement. This includes what sociologists and economists term 'reproductive labor' or 'invisible labor' such as the developer’s partner who cooked them breakfast, thereby sustaining the biological and cognitive infrastructure necessary for the developer to contribute to the repository in the first place. The AI model is merely another node of aggregated external labor in this exact same web - both by its upward 'training' and downward utilization.

Until we develop an economic and technological ontology capable of tracing and rewarding this entire ecosystem of adjacent contributions, our debates over LGPL versus MIT will remain myopic. We are trying to govern a distributed, interconnected web of collective labor using property tools designed for solitary craftsmen.

orthoxerox16d ago

One thing I don't understand is what was so bad about having an LGPL license. You are allowed to do `import chardet` in a MIT-licensed or a proprietary program.

spiffyk16d ago

That is also my understanding. My personal theory is that many corporate compliance departments (or whoever is in charge of this at a particular place) just disallow any *GPL use in their company, regardless of whether it would actually cause problems, so this is an attempt to "unblock" the library for those. Instead of, you know, educating people about the nuances of different copyleft licenses.

1 more reply

waterproof16d ago

You ask Gemini to make an Elsa and Anna Frozen-themed coloring book page. It says no, that would be copyright infringement. So you ask it to make something as close as possible but without infringing. It happily obliges.

joshjob4217d ago

"If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms. This is not a restriction on sharing. It is a condition placed on sharing: if you share, you must share in kind." -- This is, on any plain reading, a restriction on sharing. "You can share only under these conditions" is plainly more restrictive than "sure do whatever you want". You can argue that it's a restriction that ultimately leads to more sharing overall. But it is a restriction on sharing in any given case of sharing nevertheless.

duskdozer17d ago

It's just considering "sharing" or "freedom" out an extra step. "Total freedom" results in freedom for those who can protect it and no freedom for those who can't.

dataflow16d ago

Yeah, he lost me there. It's like saying "if you share this you must give me a million dollars -- that's not a restriction, it's a condition!"

zakki17d ago

I guess because our freedom is not unlimited.

bjt17d ago

> If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides. Blanchard's own claim—that he worked only from the test suite and API without reading the source—is, paradoxically, an argument for protecting that test suite and API specification under copyleft terms.

This is an interesting reversal in itself. If you make the specification protected under copyright, then the whole practice of clean room implementations is invalid.

jillesvangurp16d ago

There's a long history of people creating clean room implementations of other people's software based on specifications, reverse engineering, etc. A lot of that software is even distributed under GPL. Most drivers in the Linux kernel are good examples. There are things like Dosbox. Databases, video encoders, etc.

So, you could argue that people are using double standards here a bit. It's fine when people take proprietary software and create GPL versions of it. But it's not OK when people take GPL software and create permissively licensed or proprietary versions of it. That's of course not how copyright actually works. The reason all of this is OK is that copyright allows you to do this thing. This isn't some kind of loophole that needs closing but an essential feature of copyright.

The friction here, and common misunderstanding about how copyright works is that you don't copyright ideas but the form or expression of something. Making a painting of a photograph is not a copyright violation. Same idea, different expression. Patents are for protecting ideas. Trademarks are for protecting brands. Some companies have managed to trademark certain color codes even, which is controversial.

There's a lot of legal history for interpretation of what is and isn't "fair use" under copyright of course. It gets much more complicated if you also consider international law and how copyright works in different countries. But people being able to make reasonable use of copyrighted material always was essential to the notion of having it to begin with.

The reason we can have music that uses samples from other people's music without that being a copyright violation is exactly this fair use. In the same way, you can quote from books and create funny memes based on movie fragments. Or create new theater plays, movies, etc. reinterpreting works of others. All legal, up to a point. If you copy too much it stops being fair use and starts being plagiarism.

With software copyright violations, you have to prove that substantial parts of the software were lifted verbatim. Lawyers and judges look at this in terms of how they would apply it to a plagiarism case. Literally - software doesn't get special treatment under copyright. Copyright long predates the existence of software and computers and did not change in any material way after that was invented.

jongjong16d ago

I've been thinking about the erosion of copyright as well. It's basically making software worthless.

Already, the IP protections which exist for software suck. Patents are expensive and you can't even use them for software most of the time anyway. Copyright doesn't protect innovative ideas or architectures; if someone can just copy your code, mix it with a bunch of other code (no functionality changes) and then use it as their own; then copyright provides no protection at all...

If this is the case, then why should anyone bother to write any quality software at all? It has no value since anyone can just appropriate any essential functionality that they didn't create for themselves. What's to prevent an employee from taking their employer's source code, rewriting it with an LLM (same functionally) and generate a clone of their company's software to use as their own to compete against their employer?

Without any IP protections, anyone who writes software becomes a complete loser. There's 0 benefit. One software developer would be doing all the work and then some marketing expert or someone with good social connections could just steal their work and sell it for billions... The software developer gets NOTHING.

kccqzy17d ago

> When GNU reimplemented the UNIX userspace, the vector ran from proprietary to free. Stallman was using the limits of copyright law to turn proprietary software into free software. […] The vector in the chardet case runs the other way.

That’s just your subjective opinion which many other people would disagree. I bet Armin Ronacher would agree that an MIT licensed library is even freer than an LGPL licensed library. To them, the vector is running from free to freer.

t4356217d ago

It seems that this chap didn't go and implement a new library, he reimplemented an existing one and became sole-controller of it. i.e. he seems to have taken its reputation, brand whatever you call it away from the contributors and entirely to himself. Their work of establishing it as a well known solution is no longer recognised.

So of course we feel that something wrong has happened even if it's not easy to put one's finger on it.

derangedHorse17d ago

> Blanchard's own claim—that he worked only from the test suite and API without reading the source—is, paradoxically, an argument for protecting that test suite and API specification under copyleft terms.

Ridiculous. I don't want specifications for proprietary APIs to be protected, and I don't want the free ones to be either. The software community seemed pretty certain as a whole that this would be very bad for competition [1].

Morally, I don't think there's anything wrong with re-implementing a technology with the same API as another, or running a test suite from a GPL licensed codebase. The code wasn't stolen, it was capitalized on. Like a business using a GPL code editor to write a new one.

> This is not a restriction on sharing. It is a condition placed on sharing

Also this doesn't make any logical sense. A condition on sharing cannot exist without corresponding restrictions.

[1] https://www.reddit.com/r/Android/comments/mklieg/supreme_cou...

jongjong17d ago

There is a definite issue in terms of legitimacy and I also think there are some issues in the wording of certain open source licenses like MIT which give rights to 'Any person obtaining a copy of this software'.

Firstly, an AI agent is not a person. Secondly, the MIT license doesn't offer any rights to the code itself; it says a 'copy of the software' - That's what people are given the right to. It says nothing about the code and in terms of the software, it still requires attribution. Attribution of use and distribution of the software (or parts) is required regardless of the copyright aspect. AI agents are redistributing the software, not the code.

The MIT license makes a clear distinction between code and software. It doesn't cede any rights to the code.

And then, in the spirit of copyright; it was designed to protect the financial interests of the authors. The 'fair use' carve-out was meant for cases which do not have an adverse market impact on the author which it clearly does; at least in the cases highlighted in this article.

Sleaker17d ago

Isn't the whole thing sidestepping another issue? If the code was rewritten with an AI, then it becomes a non-copyrightable work? Hasn't this already gone through the courts? So isn't the resulting library de facto public domain, even if the maintainer wants to try and attach a license to it?

Edit: looks like an IP lawyer had this exact issue on the GitHub and it was closed.

casey217d ago

If the model wasn't trained on copyleft, if he didn't use a copyleft test suite and if he wasn't the maintainer for years. Clearly the intent here is copyright infringement.

If you have software your testsuite should be your testsuite, you do dev with a testsuite and then mit without releasing one. Depending on the test-suite it may break clean room rules, especially for ttd codebases.

smsm4217d ago

> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch.

I don't see how it matters what he looked at. If I took a copyrighted code and run it through a script that replaces all variable names, and then claimed copyright on the result because it's an entirely new work and I did not look on the original work, I'd be ridiculed and sued, and would lose that lawsuit. AI is a more complex machine, but still a machine. If you feed somebody'd work into a machine, what comes out is a derivative work.

Test suite is a part of copyrighted code, is it not? If he used just the API description, preferably from a copyright-clean source, then we could claim new work (regardless of how it was produced, by using Claude or trained pigeons or by consuming magic mushrooms). But once parts of the copyrighted code had been used, it becomes derivative work.

metalcrow17d ago

> AI is a more complex machine, but still a machine. If you feed somebody'd work into a machine, what comes out is a derivative work.

I'm not sure that's true, legally speaking. If you fed it into a PRNG, the output seems to me like it would not be an obviously derivative work (i doubt you could copyright it but that's a separate question). So we have 1 machine that can transform something into non-derivative work, and another that leaves the result derivative. The line isn't likely going to be drawn as "did a machine do it or not", but on a fuzzy human line of how close the output seems to be to the original (IANAL).

1 more reply

makerofthings17d ago

If an AI can license-wash open source software like this then the licenses become meaningless. Which is fascinating. Commercial software cloning that is simple enough for an average person to drive is next and the ultimate form of piracy, see an app for $10? Don’t fancy paying? Just ask ChatGPT for a clone. Future is going to be wild.

paxys17d ago

You've just described why every SaaS stock has taken a beating in the last 6 months.

1 more reply

j-bos17d ago

> ultimate form of piracy

Nothing was stolen, not even copied, lamest piracy I've heard of.

1 more reply

grahamlee17d ago

It's clear that we're entering a new era of copyright _expectations_ (whether we get new _legislation_ is different), but for now realise this: the people like me who like copyleft can do this too. We can take software we like, point an agent at it, and tell it to make a new version with the AGPL3.0-or-later badge on the front.

anonymous_sorry17d ago

But the LLM contributions would likely be ruled public domain, so AGPL may not be enforceable on these.

armchairhacker17d ago

The point of GPL is to restrict distribution. If there’s already an MIT version, it’s useless.

2 more replies

pu_pe16d ago

If anyone can go in, take a GPL project like chardet and reimplement it using LLMs, then the current maintainer just saved everyone time by making their implementation publicly available.

Our legal framework wasn't built for a situation where reimplementing complex software is trivial, much less almost completely automated.

tzs17d ago

The GPL's conditions are triggered only by distribution. If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms.

Offering as a networked service is not distribution. That was why they had to make AGPL to put conditions on use in networked services.

tzs17d ago

Oops…the first paragraph is a quote from the article but I somehow forgot the “>”.

righthand17d ago

I think what is happening is the collapse of the “greater good”. Open source is dependent upon providing information for the greater good and general benefit of its readers. However now that no one is reading anything, its purpose is for the great good of the most clever or most convincing or richest harvester.

martin-t17d ago

1) Legality and morality are obviously different and unrelated concepts. More people should understand that.

2) Copyright was the wrong mechanism to use for code from the start, LLMs just exposed the issue. The thing to protect shouldn't be creativity, it should be human work - any kind of work.

The hard part of programming isn't creativity, it's making correct decisions. It's getting the information you need to make them. Figuring out and understanding the problem you're trying to solve, whether it's a complex mathematical problem or a customer's need. And then evaluating solutions until you find the right one. (One constrains being how much time you can spend on it.)

All that work is incredibly valuable but once the solution exists, it's each easier to copy without replicating or even understanding the thought process which led to it. But that thought process took time and effort.

The person who did the work deserved credit and compensation.

And he deserves it transitively, if his work is used to build other works - proportional to his contribution. The hard part is quantifying it, of course. But a lot of people these days benefit from throwing their hands up and saying we can't quantify it exactly so let's make it finders keepers. That's exploitation.

3) Both LLM training and inference are derivative works by any reasonable meaning of those words. If LLMs are not derivative works of the training data then why is so much training data needed? Why don't they just build AI from scratch? Because they can't. They just claim they found a legal loophole to exploit other people's work without consent.

I am still hoping the legal people take time to understand how LLMs work, how other algorithms, such as synonym replacement or c2rust work, decide that calling it "AI" doesn't magically remove copyright and the huge AI companies will be forced to destroy their existing models and train new ones which respect the licenses.

flavionm16d ago

> The person who did the work deserved credit and compensation.

That's the part of the argument in favor of copyright that is inherently flawed.

Doing some amount of work doesn't entitle you to anything besides whatever you've agreed to get for that work, or possession of the output, in case you did it for yourself. But that's all you're entitled to get.

Work itself doesn't have any intrinsic value, only output does. The scarcity of output is what dictates what is actually valuable.

Creative work has the characteristic of its marginal cost being very high for the first copy, but nearly zero for additional copies. That's true simply because of the nature of such work, it isn't something that is unfairly imposed upon creative workers. Whenever you choose to engage in creative work, you know that, or at least you should. And if you choose to give away the first copy for free, or very cheap, that's your prerrogative, but it doesn't inherently entitle you to anything else besides the value of that first copy.

Yes, there are laws such as copyright laws that exist to artificially inflate the value of additional copies, but they go against how things work naturally, so you shouldn't rely on them, and you certainly shouldn't base your moral compass on them.

Now, I do still prefer copyleft licenses over permissive ones for the work I choose to give away for free, but only to stop corporations from taking that work and then using copyright laws to keep it exclusive to them. Once copyright is no longer an issue, they won't be necessary anymore.

1 more reply

wvenable17d ago

> If LLMs are not derivative works of the training data then why is so much training data needed?

If you went to school for 12-16 years, that's a lot of training. Does that mean anything you produce is a derivative work?

1 more reply

motbus316d ago

To me, there is a confusion of what "copying" and "using" means.

You can copy the idea and not use the source code. This has been ruled ok many times already and would be quite dangerous if that was not the case.

But this is not what this is. To generate the new program, another program, the AI, must have an input which then becomes part of the program itself. It does not really matter much if the generation does not contain the source code itself or a similar reimplementation. One could rewrite a full version of the Lord of the Rings changing all the words but having the same elements, it would still be plagiarism. No reason to think this is not the case here. It is evident that the source code was the base, hence, this is a derived work.

dleslie17d ago

IMHO, the API and Test Suite, particularly the latter, define the contract of the functional definition of the software. It almost doesn't matter what that definition looks like so long as it conforms to the contract.

There was an issue where Google did something similar with the JVM, and ultimately it came down to whether or not Oracle owned the copyright to the header files containing the API. It went all the way to the US supreme court, and they ruled in Google's favour; finding that the API wasn't the implementation, and that the amount of shared code was so minimal as to be irrelevant.

They didn't anticipate that in less than half a decade we'd have technology that could _rapidly_ reimplement software given a strong functional definition and contract enforcing test suite.

blurbleblurble16d ago

If specifications become IP? Reboot the pirate parties. Authoritarianism is what it is. The exploitation isn't coming from the tools, it's coming from the economic structures and forces of exploitation being brought to their natural limits. We should learn from the luddites, the actual luddites. They weren't anti-technology, they were against the insane consolidation of power. This proposal might seem radical but all it would really do is reify intellectual property at exactly the wrong moment, a game over moment. Feed local LLMs. Feed peoples' movements around these technologies, so that people bring agency into how we use the tech, so that we don't get dominated by the laws that form around it.

sombragris16d ago

I'm firmly in favor of copyleft. But I get what chardet's maintainer has done: reimplement a piece of software. This has been done a lot of times. Musl reimplementing glibc, llvm reimplementing gcc, etc., all of them with non-copyleft licenses.

However, the purported reimplementations did not usurp the names of the reimplemented product. Reimplement chardet using AI and insisting in calling the product the same as old chardet with a new version number and a new license is, I think, not exactly honest. At least he should have used something like "chardet-ng", "chardet-fresh", or whatever, and a completely different source tree.

nicole_express17d ago

Not a lawyer, but my understanding is: In theory, copyright only protects the creative expression of source code; this is the point of the "clean room" dance, that you're keeping only the functional behavior (not protected by copyright). Patents are, of course, an entirely different can of worms. So using an LLM to strip all of the "creative expression" out of source code but create the same functionality feels like it could be equivalent enough.

I like the article's point of legal vs. legitimate here, though; copyright is actually something of a strange animal to use to protect source code, it was just the most convenient pre-existing framework to shove it in.

dathinab17d ago

> this is the point of the "clean room" dance

which is the actual relevant part: they didn't do that dance AFIK

AI is a tool, they set it up to make a non-verbatim copy of a program.

Then they feed it the original software (AFIK).

Which makes it a side by side copy, as in the original source was used as reference to create the new program. Which tend to be seen as derived work even if very different.

IMHO They would have to:

1. create a specification of the software _without looking at the source code_, i.e. by behavior observation (and an interface description). I.e. you give the AI access to running the program, but not to looking into the insides of it. I really don't think they did it as even with AI it's a huge pain as you normally can't just brute force all combinations of inputs and instead need to have a scientific model=>test=>refine loop (which AI can do, but can take long and get stuck, so you want it human assisted, and the human can't have inside knowledge about the program).

2. then generate a new program from specification, And only from it. No git history, no original source code access, no program access, no shared AI state or anything like that.

Also for the extra mile of legal risk avoidance do both human assisted and use unrelated 3rd parties without inside knowledge for both steps.

While this does majorly cut cost of a clean room approach, it still isn't cost free. And still is a legal mine field if done by a single person, especially if they have enough familiarity to potentially remember specific peaces of code verbatim.

3 more replies

svilen_dobrev17d ago

i've been following this for a while.. and the trend for copyright (of any form - books code pictures music whatever) being laundered by reinventing the "same" thing in-some-way.. is kind-of clear.

But what happens with the new things? Has the era of software-making (or creating things at large) finished, and from now on everything will be re-(gurgitated|implemented|polished) old stuff?

Or all goes back to proprietary everything.. Babylon-tower style, noone talks to noone?

edit: another view - is open-source from now on only for resume-building? "see-what-i've-built" style

dwroberts17d ago

One of the things that irks me about this whole thing is, if it’s so clean room and distinct, why make the changes to the existing project? Why not make an entirely new library?

The answer to that, I think, is that the authors wanted to squat an existing successful project and gain a platform from it. Hence we have news cycle discussing it.

Nobody cares about a new library using AI, but squash an existing one with this stuff, and you get attention. It’s the reputation, the GitHub stars, whatever

nicole_express17d ago

I mean, Blanchard was the longtime maintainer of chardet already, and had wanted to relicense it for years. So I think that complicates your picture of "squatting an existing successful project".

Honestly it's a weird test case for this sort of thing. I don't think you'd see an equivalent in most open source projects.

intrasight17d ago

I agree. But you can't copyright goodwill and reputation. Trademark does provide some protection there, right?

danbruc17d ago

Why are people even having problems with sharing their changes to begin with? Just publishing it somewhere does not seem too expensive. The risk of accidentally including stuff that is not supposed to become public? Or are people regularly completely changing codebases and do not want to make the effort freely available, maybe especially to competitors? I would have assumed that the common case is adding a missing feature here, tweaking something there, if you turn the entire thing on its head, why not have your own alternative solution from scratch?

panny17d ago

Both sides are wrong on this actually. Computer generated code has no copyright protection.

>The U.S. Copyright Office (USCO) and federal courts have consistently ruled that AI-generated works—where the expressive elements are determined by the machine, even in response to a human prompt—lack the necessary human creative input and therefore cannot be copyrighted.

All this code is public domain. Your employees can publish "your" AI generated code freely and it won't matter how many tokens you spent generating it. It is not covered by copyright.

wiz21c16d ago

within 20 years, everyone will be developing software which will be copyrighted partly by AI and be behind walled gardens. Sure you'll be able to do things locally but everything (security clearance, walled garden, government's control etc) but it will forever remain at the level of "tinkering".

If you are 50 years old or more, the computing you were born with (you own the computer, you own the programs) will be gone. Copyleft only makes sense if you own the computer.

That makes me sad.

ball_of_lint17d ago

Would software be more or less free in a world without copyright?

I argue more free. EULAs and restrictions on how+for what software can be used, like DRM, typically use copyright as their legal backing. GPL licenses turn that on it's head but that doesn't redeem the original, flawed, law.

This seems to follow the letter but not the spirit of the license. If this does pass legal muster, we can do the same to whatever proprietary software we wish, which makes a dramatically different but IMO better ecosystem in the end.

duskdozer17d ago

It will be disproportionately hard to do it to proprietary software though. The imbalance of power is sort of what the GPL was there for

ajross17d ago

This take, which I've seen in a few different places now, seems 100% bonkers. A world where anyone can cheaply reimplement anyone else's software and use it on hardware of their own choosing in their own designs and for their own purposes is a free software utopia.

This isn't a problem, this is the goal. GNU was born when RMS couldn't use a printer the way he wanted because of an unmodifiable proprietary driver. That kind of thing just won't happen in the vibe coded future.

duskdozer17d ago

It's not going to be like that for proprietary software. All this future ends with is "totally free" software that companies will leech off of in their "totally locked down" software. I guarantee you that people wouldn't have had this reaction if someone had instead replicated Windows from leaked source. Well, other than Microsoft owners/employees.

primenum16d ago

I'm a bit shocked by how people are equating recreating from copyright with recreating from copyleft. It's not the same thing... copyleft code is out in the open on purpose, with the condition that you share back in kind. People are latching onto the intellectual property angle of this article, but the point is way simpler than that: "The terms of that compact were: if you take this and build on it, you share back under the same terms."

stagger8717d ago

I'm probably spitting in the wind, but stuff like this is why I removed all my hosted open source projects. I manage several niche projects that I have now converted to binary only releases (to almost no push back). It's niche enough that it's not very hard to get LLMs to output chunks of code that it managed to scrape before I took it offline. I don't see many people talking about this angle, but LLMs ripping off my work killed my open source efforts.

alterom16d ago

>I don't see many people talking about this angle, but LLMs ripping off my work killed my open source efforts.

This is exactly what the article is talking about.

t4356217d ago

Why does anyone need his new library? They can do what he did and make their own.

I'm glad we can fork things at a point and thumb our noses at those who wish to cash in on other's work.

warkdarrior17d ago

Why would I make my own? The new library is released under MIT license and faster than the old one.

1 more reply

wvenable17d ago

Consider it an LLM cache. The result has already been cached so you don't have to generate it again.

josalhor17d ago

I think the direction we are going, the GPL is going to fade away. I think people will look at this like writing a book and claiming the ideas in the book cannot be copied. This debate is not that different from the ones going on in the music industry. I open sourced my latest software as Apache 2.0 after debating a lot about this. Unless the FSF wins in court in the next <=2-3 years, there is no coming back from this.

niemandhier16d ago

Look at sqllite. They have a good example going how code can be open and you still have a product.

For SQLite the actual product is the test-suite and the audits.

Sure you can use the code all you like, but you only ever get past quality gates if you use the audited and provably tested version.

This becomes just more relevant in the age of ai coding, where an agent might be able to reimplement your specs.

Keep your code open, but consider moving your tests.

arjie17d ago

Well, the license change sounds pretty strange, but to be honest if I were to use this software I would use it without adhering to the MIT. It's machine-created content which is not, in general, copyrightable. You can assert whatever license you want on such content, but I am not going to adhere to it. For example, I declare you may use the following under the Elastic License

The

wvenable17d ago

I wonder how one proves that the software is machine created.

jpauline16d ago

I'm specifically worried about gating trained data where publicly accessible information is blocked/opted-out. If we're opening up Pandora's Box with genAI and training data, we may as well give it what is accessible to the average user. Its going to end up having the same issues a user with implicit knowledge or memories would have anyway.

0x45717d ago

> Antirez does not address this directional difference. He invokes the GNU precedent, but that precedent is a counterexample to his conclusion, not a supporting one.

Morally - yes, technically - no. I think it's odd to be mad at someone doing the exact thing you praise in another case just because license isn't copyleft within license allowance. Make a better copyleft license?

nickcoffee16d ago

The practical question I keep coming back to: if the output is meaningfully different and faster, at what point does the reimplementation argument become less about the code and more about the reputation and distribution the original project built? That seems like the harder thing to replicate, and the harder thing to protect.

strongpigeon17d ago

I feel like the licenses that suffer the most isn't the GPL, but the ones like SSPL. If your code can be re-implemented easily and legally by AWS using an LLM, why risk publishing it?

It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.

colmmacc17d ago

The four essential freedoms of the Free Software movement are ...

1. The freedom to run the program as you wish 2. The freedom to study how it works and modify it (which requires access to source code) 3. The freedom to redistribute copies to help others 4. The freedom to distribute modified versions, so the whole community benefits from your improvements

To my mind ... GenAI coding make all of these far more realizable, especially for "normal people", than CopyLeft ever has. Let's go through them ...

Want to run a program as you wish? Great! It's easier than ever to build a replacement. Proprietary or non-free software is just as vulnerable to reimplementation as Copyleft is.

Want to study a how a program works and to modify it? This is now much more achievable.

Want the freedom to redistribute copies to help others? Build your own version! It may not even be copyrightable if it's 100% generated (IANAL).

Want to distribute modified versions? yes! see previous.

I dunno; seems like generative coding can be as much a liberator as any kind of problem.

skydhash17d ago

Unless your idea of software is reduced to the set of todo app, I don’t see how your points hold. AI won’t give you Blender, Inkscape, Kicad, Emacs,… And the algorithms behind those are not secrets, it’s the cohesive vision behind the whole system that is hard.

People will still pay for Matlab, SolidWorks, and Maya because no one who need those will vibe-code a solution. And there’s plenty of good OSS versions for the others.

alterom17d ago

Sorry, but this seems to be so off-base (as well as naively optimistic) that I am having difficulty responding to this.

But I'll try nevertheless.

- >Want to run a program as you wish? Great! It's easier than ever to build a replacement.

Non-sequitur. Building a replacement does nothing for being able to run a program as you wish.

Nobody else is able to run your program as they wish unless you release it with a Copyleft license.

- >Want to study a how a program works and to modify it? This is now much more achievable.

Reverse engineering is more achievable.

Modifying a program, without having its source code, documentation, and a legal right to do so guaranteed by the license is (and always be) easier compared to not having those things.

- >Want the freedom to redistribute copies to help others? Build your own version! It may not even be copyrightable if it's 100% generated (IANAL).

So, that's not about redistributing copies. That's about building an alternative option.

I can download an Ubuntu image and get Libre Office on it with a click.

Go vibe-code me a Microsoft Excel running on Windows 11, please, and tell me it's easier.

- >Want to distribute modified versions? yes! see previous.

You're not even trying here.

One can't legally modify and redistribute copyrighted works without explicit permission to do so.

You keep saying "...but vibe coding allows anyone to create something else entirely instead and do whatever with it!" as if that is a substitute for checking out a repo, or simply downloading FOSS software to use as you wish.

- >I dunno; seems like generative coding can be as much a liberator as any kind of problem.

Now, that statement I fully agree with.

Generative coding is a liberator as much as any kind of problem is.

Headache, for example, is generally a problem. It's not a great liberator.

Neither is generative coding.

Now, you probably didn't intend to say what you wrote. And that's exactly why generative coding is not a panacea: the only way to say things that you mean to say is to write precisely what you mean to say.

Vibe-coding (like any vibe-writing) simply can't accomplish that, by design.

api17d ago

It also erodes copyright. A decent amount of commercial software can be AI cloned with no copyright violation.

A lot of SaaS too, especially if AI can run a simple deploy.

We might be approaching a huge deflationary catastrophe in the cost of a lot of software. It’s not a catastrophe for the consumer but it is for the industry.

krater2316d ago

In Germany we call this 'Beißreflex'. It's all good, when someone reimplements something in Rust, no one asks for the license, but as soon someone uses AI to reimplement something, the search for something to complain about begins.

waffletower16d ago

I am appreciating the return of the meme "information wants to be free".

ori_b17d ago

If you can prove the LLM was not trained on the code it is reproducing, and has never seen the code as part of a spec to follow, I don't see a problem.

Proving this is going to be hard with current "open source" models.

eschaton17d ago

Indeed, you have to prove that the LLM is generating code from a specification. Right now they don’t do that; what they do is regurgitate portions of their training data based on correlations with input tokens.

Put the programmer’s reference for the Digital Equipment DEQNA QBus Ethernet adapter in your favorite slop tool and tell it to make a C or C++ implementation for an emulator, and you know what you get? Code from SIMH. That’s not “generating,” that’s “copying.”

winstonwinston17d ago

> Blanchard's account is that he never looked at the existing source code directly.

That’s a weird statement while releasing the new version of the same project. Maybe just release it as a new project, chardet-ai v1.0 or whatever.

daemin17d ago

One main thing that this brings to mind is if an LLM can ever actually create a clean room implementation of a piece of open source software, given that there is a near certainty that the software was used in its training data. Therefore it has seen it and remembered it, and could if appropriately prompted recreate the code verbatim.

This can also apply to people, either if they have seen the code previously and therefore are ineligible to write the code for a clean-room implementation, or it gets murky when the same person writes the same code twice from their own knoeldge, as in the Oracle Java case.

Coming from a professional programming perspective I can totally see the desire to have more libraries written in permissive licences like BSD or MIT, as they allow one like myself to include them in commercial closed-source products without needing to open source the entire codebase.

However I find myself agreeing with the article in so far as this LLM generated implementation is breaking the social contract for a GPL/LGPL based library. The author could have easily implemented the new version as a separate project and there would not have been an outcry, but because they are replacing the GPL version with this new one it feels scummy to say the least.

sayrer17d ago

I don't think this part is correct: "If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms."

That's what something like AGPL does.

xbar16d ago

Oracle v Google concluded that APIs could not be protected by either copyright or copyleft. It seemed to me at the time that most here supported that decision. Has anything changed?

vbarrielle16d ago

No, APIs fall under copyright, but the Supreme Court found that Google's reimplementation of Java's API was falling under fair use. Fair use is decided case by case, one cannot use that decision as a precedent.

moralestapia17d ago

That's a non-sequitur. chardet v7 is GPL-derived work (currently in clear violation of the GPL). If xe wanted it to be a different thing xe should've published as such. Simple as.

tw198417d ago

Claude must be trained on chardet already, it worked on chardet's code to optimize or rewrite it to be much better. This is the textbook definition of derivative works.

krater2316d ago

There is fewer then 2% of code a copy of chardet. When the developer of chardet had done it without AI, whats then? He is trained on the same code too.

hungryhobbit17d ago

I largely agree with the author that AI can't just magically remove license agreements by rewriting code.

However, I take issue with his version of history:

>The history of the GPL is the history of licensing tools evolving in response to new forms of exploitation: GPLv2 to GPLv3, then AGPL.

GPLv3 set open source backwards: it wasn't an evolution to protect anything, it was a an overly paranoid failure. Don't believe me? Just count how many GPL3 vs. how many GPL2 projects have been started since GPL3 dropped.

Again, I'm very pro-OSS, but let's not pretend the community has always had a straight line of progress forward; some stuff is crazy Stallman stuff that set us back.

randyrand17d ago

It doesn't matter if it's legitimate. The people that use it don't care. They just find it online and click download. This is the reality.

kanemcgrath17d ago

without discussing copyright, I don't believe any of this is copied. Which I think should be the argument that actually matters.

I downloaded both 6.0 and 7.0 and based on only a light comparison of a few key files, nothing would suggest to me that 7.0 was copied from 6.0, especially for a 41x faster implementation. It is a lot more organized and readable in my armature opinion, and the code is about 1/10th the size.

iberator17d ago

Easy solution for now:

Add something like this to NEW gpl /bsd/mit licenses:

'you are forbidden from reimplementing it with AI'

or just:

'all clones, reimpletetions with ai etc must still be GPL'

keeda17d ago

A large part of our industry is experiencing significant cognitive dissonance and articles like this are a symptom of that. AI is not really changing things, it's simply forcing us to question a lot of things we took for granted.

One of those things is that we assumed that the code embodied most of the value it offered. That it was the code that contained the creativity and expressiveness and usefulness. And we thought only we could write code. And so we thought we only needed to protect the code to protect our efforts and investments. Which is also why we accepted copyright as an appropriate legal protection for software, or of enforcing an ethos of sharing, as with copyleft.

But the code itself was never the valuable aspect; it was the functionality it provided.

And now AI is making that starkly apparent, while undermining a lot of other presumptions. Including about copyright.

Copyright protection for software is a historical hack because people didn’t want to figure out an appropriate legal framework from scratch. You “wrote” books, you "wrote" code, let’s shoehorn software into copyright and go get lunch! Completely overlooking the fact that copyright explicitly does not cover functional aspects (that is the realm of patents) which is the entire raison d'etre of code.

Sure, copyright covers “expressive elements”, but again those are properties of the source code, not the functionality. In fact, expressiveness is BAD for code (cf “code should be boring”)! Copyright will protect whether you used a streams API or a for-loop for iteration, which is absolutely irrelevant to the technical functionality that actually solves user problems, which has always been the only thing users really cared about.

In fact, if you look at significant copyright-related cases for software now (e.g. Oracle vs Google), you'll realize they have twisted themselves into knots trying to apply laws intended for expressive creativity to issues that were essentially about technical creativity.

I have no hopes that we will figure out an appropriate IP framework for software, so I expect people will move towards other things like patents, trade secrets and trademarks. Which have their own problems, but at least they already exist and are more suitable than copyright, especially in the age of AI.

mh226617d ago

Buried in here: Mark Pilgrim suddenly reappearing after his sudden disappearance years ago! Has he been up to anything since then?

mwkaufma17d ago

A lot of untagged IANAL takes here today.

jFriedensreich16d ago

I wonder what value gpl even has in a world where i can trivially reimplement whatever a company builds on a permissive license and does not share. I see still a place for things that are low level, algo heavy, real world test heavy and critical eg. kernel, cryptography, storage engine, filesystems all the rest of userland and web not so much

hexyl_C_gut17d ago

I'm less concerned about AI eroding copyleft and more exited about AI eroding copy right.

animitronix17d ago

LPGL is dead, long live the AI rewrites of your barely open source code

throwaway202717d ago

Perhaps software patents may play an even bigger role in the future.

1 more reply

Khaine17d ago

Someone be brave, and do this to ZFS. Poke the Oracle bear!

1 more reply

palata17d ago

> an argument for protecting that test suite and API specification under copyleft terms.

If we protect API under copyright, it makes it easier to prevent interoperability. We obviously do NOT want that. It would give big companies even more power.

Now in the US, the Supreme Court that the output of an LLM is not copyrightable. So even a permissive licence doesn't work for that reimplementation: it should be public domain.

Disclaimer: I am all for copyleft for the code I write, but already without LLMs, one could rewrite a similar project and use the licence they please. LLMs make them faster at that, it's just a fact.

Now I wonder: say I vibe-code a library (so it's public domain in the US), I don't publish that code but I sell it to a customer. Can I prevent them from reselling it? I guess not, since it's public domain?

And as an employee writing code for a company. If I produce public domain code because it is written by an LLM, can I publish it, or can the company prevent me from doing it?

internet200017d ago

I stopped reading here:

> The ethical force of that project did not come from its legal permissibility—it came from the direction it was moving, from the fact that it was expanding the commons. That is why people cheered.

How is this not just relitigating GPL vs MIT? By now you know which side of that argument you are in. The AI component is orthogonal.

eduction16d ago

Despite the tech layoffs and rise of AI, programmer hubris is alive and well, that is heartening.

Here we see three engineers writing — at length! — about a hugely complicated matter of law.

No one outside your bubble cares what you think. You are unqualified and your opinions irrelevant. You might as well be debating open heart surgery techniques.

mbgerring17d ago

See also "A Declaration of the Independence of Cyberspace" (https://www.eff.org/cyberspace-independence), and what a goofy, naive, misguided disaster that early internet optimism turned into.

No, AI does not mean the end of either copyright or copyleft, it means that the laws need to catch up. And they should, and they will.

pie_flavor16d ago

While no fan of AI slop, is there any difference between this and musl/busybox/etc, minus the addition of AI? Did anyone get mad at busybox before AI?

mfabbri7717d ago

What if someone doesn't declare that it has been reimplemented using an LLM? Isn't it enough to simply declare that you have reimplemented the software without using an LLM? Good luck proving that in court...

One thing is certain, however: copyleft licenses will disappear: If I can't control the redistribution of my code (through a GPL or similar license), I choose to develop it in closed source.

1 more reply

youknownothing16d ago

Theseus ship, anyone?

delichon17d ago

Imagine if the author has his way, and when we have AI write software, it becomes legally under the license of some other sufficiently similar piece of software. Which may or may not be proprietary. "I see you have generated a todo app very similar to Todoist. So they now own it." That does not seem like a good path either for open source software or for opening up the benefits of AI generated software.

throawayonthe17d ago

shall we now have to think about the tradeoffs in adopting

- proprietary

- free

- slop-licensed

software?

1 more reply

logicprog17d ago

> Ronacher notes this as an irony and moves on. But the irony cuts deeper than he lets on. Next.js is MIT licensed. Cloudflare's vinext did not violate any license—it did exactly what Ronacher calls a contribution to the culture of openness, applied to a permissively licensed codebase. Vercel's reaction had nothing to do with license infringement; it was purely competitive and territorial. The implicit position is: reimplementing GPL software as MIT is a victory for sharing, but having our own MIT software reimplemented by a competitor is cause for outrage. This is what the claim that permissive licensing is “more share-friendly” than copyleft looks like in practice. The spirit of sharing, it turns out, runs in one direction only: outward from oneself.

This argument makes no sense. Are they arguing that because Vercel, specifically, had this attitude, this is an attitude necessitated by AI, reimplementation, and those who are in favor of it towards more permissive licenses? That certainly doesn't seem to be an accurate way to summarize what antirez or Ronacher believe. In fact, under the legal and ethical frameworks (respectively) that those two put forward, Vercel has no right to claim that position and no way to enforce it, so it seems very strange to me to even assert that this sort of thing would be the practical result of AI reimplementations. This seems to just be pointing towards the hypocrisy of one particular company, and assuming that this would be the inevitable universal, attitude, and result when there's no evidence to think so.

It's ironic, because antirez actually literally addresses this specific argument. They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters. Specifically, the idea he puts forward is that yes, corporations can do these kinds of rewrites now, but they always had the resources and manpower to do so anyway. What's different now is that individuals can do this kind of rewrites when they never have the ability to do so before, and the vector of such a rewrite can be from a permissive to copyleft or even from decompile the proprietary to permissive or copyleft. The fact that it hasn't been so far is a more a factor of the fact that most people really hate copyleft and find an annoying and it's been losing traction and developer mind share for decades, not that this tactic can't be used that way. I think that's actually one of the big points he's trying to make with his GNU comparison — not just that if it was legal for GNU to do it, then it's legal for you to do with AI, and not even just the fundamental libertarian ethical axiom (that I agree with for the most part) that it should remain legal to do such a rewrite in either direction because in terms of the fundamental axioms that we enforce with violence in our society, there should be a level playing field where we look at the action itself and not just whether we like or dislike the consequences, but specifically the fact that if GNU did it once with the ability to rewrite things, it can be done again, even in the same direction, it now even more easily using AI.

2 more replies

antonio-mello17d ago

The practical tension I see: I build open source tools and use AI heavily in the process (Claude as a coding assistant). Every commit has "Co-Authored-By: Claude" in it. The code is MIT-licensed and genuinely mine in terms of architecture and intent, but the line-by-line generation is clearly AI-assisted.

This creates an odd situation where the "reimplementation via AI" concern cuts both ways. If someone feeds my MIT repo to an LLM and gets a copyleft-violating derivative, that's one problem. But if I use an LLM trained on copyleft code to write my MIT-licensed tool, am I the one laundering licenses without knowing it?

I think the article's core point holds: legitimacy and legality are diverging fast. The open source community built norms around intent and reciprocity, and those norms are now being stress-tested by tools that can reimplement anything from a spec. No license text can fully encode "don't be a free rider."

_zagj16d ago

> creator of Redis, published a broader defense of AI reimplementation, grounding it in copyright law and the history of the GNU project

Has anyone else lost almost all respect for Antirez because of stuff like this?

moi238817d ago

Perhaps we should finally admit that copyright has always been nonsense, and abolish this ridiculous measure once and for all

4 more replies

throwaway202717d ago

I think we're going one step too far even, AI itself is a gray area and how can they guarantee it was trained legally or if it's even legal what they're doing and how can they assert that the input training data didn't contain any copyrighted data.

1 more reply

j / k navigate · click thread line to collapse

599 comments

jrochkind117d ago

> If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides.

dogcomplex17d ago

Nor should we be treating AI models themselves as respected IP. They're built on everyone else's data. Throw away this whole class of law, it's irrelevant in this new world.

6 more replies

thayne17d ago

2 more replies

RobRivera17d ago

Sounds very similar to that whole API lawsuit with oracle.

1 more reply

glhaynes16d ago

thayne17d ago

Although I think the chance of that happening is effectively zero.

noemit16d ago

zelphirkalt16d ago

1 more reply

Ferret744616d ago

Our foreparents of FOSS (e.g. RMS) fought to destroy copyright. Copyleft was a subversive mechanism to "neutralize" copyright using the laws of copyright against itself.

1 more reply

red_admiral16d ago

The specification of chardet, which started this all off, is essentially forced by the unicode statndard though.

az22616d ago

SCOTUS ruled on this already when Google copied Sun’s Java wholesale.

1 more reply

alterom17d ago

>Our foreparents fought for the right to implement works-a-like to corporate software packages, even if the so-called owners did not like it

Our "foreparents" weren't competing with corporations with unlimited access to generative AI trained on their work. The times, they're-a-changin'.

You're rehashing the argument made in one of the articles which this piece criticizes and directly addresses, while ignoring the entirety of what was written before the conclusion that you quoted.

If anyone finds themselves agreeing with the comment I'm responding to, please, do yourself a favor and read the linked article.

I would do no justice to it by reiterating its points here.

5 more replies

zmmmmm17d ago

palmotea17d ago

4 more replies

utopiah16d ago

> operate completely generally as knowledge creation engines: solving math proofs, designing drugs, etc.

Any example of that? So far I haven't seen any but maybe I'm looking at the wrong places.

I've see a lot of :

- "solving" math proofs that were properly formalized, with often numerous documented past attempts, re-verified by proper mathematicians, without necessarily any interesting results

- haven't seen any designed trust, most I've seen was (again with entire teams of experts behind) finding slight optimizations

PS: for the epistemological distinction you can see a few past comments of mine (e.g. https://news.ycombinator.com/item?id=47011884 )

satvikpendem17d ago

Good. Intellectual property is now a twisted concept by the elite, whatever its benefits were previously. As soon as Disney made Mickey popular, it was all downhill.

godd217d ago

1 more reply

rfw30017d ago

It is entirely possible, however, that human beings will not be the primary drivers of progress on those problems.

1 more reply

treyd17d ago

> if this transcends copyright and unravels the whole concept of intellectual property.

nradov17d ago

Nothing changes for drug patents regardless of whether an LLM was used in the discovery process.

2 more replies

kindkang202417d ago

LelouBil17d ago

This is similar :

https://www.vice.com/en/article/musicians-algorithmically-ge...

Two musicians generated every possible melody within an octave, and published them as creative Commons Zero.

I never heard about this again though.

Eridrus17d ago

The point of IP is to encourage the creation of new things.

Not all protections have to be ones that give total control like copyright.

I think it's a mistaken assumption that costs will fall to zero. The low hanging fruit will get picked, and then we'll be doing expensive combined AI/wetlab search for new drugs.

If there is any meaningful headroom we will keep doing expensive things to make progress.

1 more reply

prohobo16d ago

In terms of math and biochemistry the cost of generating candidates has collapsed, but the cost of validating them hasn't.

js817d ago

gnopgnip17d ago

Also copyright can protect something normally not eligible when the author chooses what information to include and exclude

AlienRobot17d ago

The basis of your argument is that AI-generated work isn't hard, but your conclusion is that ALL work, AI-generated or not, should lose IP rights?

eru17d ago

There's different kinds of intellectual property.

1 more reply

newyankee17d ago

matheusmoreira17d ago

paxys17d ago

"Hard" or "easy" has never been part of the premise.

A company spends a decade and billions of dollars to develop a groundbreaking drug and patents it.

I think of a cool new character called "Mr Poop" and publish a short story about him with an hour of work.

Both of us get the exact same protection under the law (yes yes I know copyright vs patent etc., but ultimately they are all about IP protection).

keeda17d ago

Creativity is still hard. AI-generated content is called "slop" for a reason ;-)

hyperman117d ago

I've always thought the opposite: IP law was created to make sure creativity stays hard, and hence controllable by the elites.

To Protect The Arts, and To Time Limit Trade Secrets were just the Protect The Children of old times, a way to confuse people who didn't look too hard at actual consequences.

2 more replies

spwa417d ago

Don't worry. The courts have consistently sided with huge companies on copyright. In the US. In Europe. Doesn't matter.

Schoolkid downloads a movie? 30,000 USD per infraction PLUS armed police officer goes in and enforces removal of any movies.

ordu17d ago

davidw17d ago

> LLM as the main weapon

I don't think that's an exciting idea for the Free Software Foundation.

Perhaps with time we'll be able to run local ones that are 'good enough', but we're not there yet.

There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.

8 more replies

stebalien17d ago

Unfortunately, there are cases where you simply can't just "re-implement" something. E.g., because doing so requires access to restricted tools, keys, or proprietary specifications.

2 more replies

cubefox17d ago

A court ordered the first Nosferatu movie to be destroyed because it had too many similarities to Dracula. Despite the fact that the movie makes rather large deviations from the original.

2 more replies

johnofthesea17d ago

Is this LLM thing freely available or is it owned and controlled by these companies? Are we going to rent the tools to fight "evil software corporations"?

3 more replies

Peritract17d ago

> chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.

LLMs are one of the primary manifestations of 'evil software corporations' currently.

dathinab17d ago

> we'll see that it was an attempt to fight copyrights with copyrights

it's not that simple

yes, GPLs origins have the idea of "everyone should be able to use"

but it also is about attribution the original author

and making sure people can't just de-facto "size public goods"

the kind of AI usage is removing attribution and is often sizing public goods in a way far worse then most companies which just ignored the license did

so today there is more need then ever in the last few decades for GPL like licenses

1 more reply

webstrand17d ago

Reducing it to "well you can clone the proprietary software you're forced to use by LLM" is really missing the soul of the GPL.

1 more reply

paxys17d ago

thomastjeffery17d ago

---

1 more reply

mikkupikku17d ago

re-thc17d ago

> What AI are eroding is copyright.

At the moment it's people that are eroding copyright. E.g. in this case someone did something.

"AI" didn't have a brain, woke up and suddenly decided to do it.

Realistically nothing to do with AI. Having a gun doesn't mean you randomly shoot.

xantronix17d ago

wolvesechoes17d ago

> AI is eroding copyright

Unless it is IP of the same big corpos that consumed all content available. Good luck with eroding them.

martin-t17d ago

This is naive. Advertisement and network effects win. Individuals cannot compete with corporations on equal ground here.

sharkjacobs17d ago

> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch

dathinab17d ago

> Blanchard is, of course, familiar with the source code, he's been its maintainer for years.

I would argue it's irrelevant if they looked or didn't look at the code. As well as weather he was or wasn't familiar with it.

If I blindfold myself when making copies of books with a book scanner + printer I'm still engaging in copyright infringement.

If AI is a tool, that should hold.

3 more replies

logicprog17d ago

3 more replies

axus17d ago

Oracle had it's day in court with Google over the Java APIs. Reimplementing APIs can be done without copyright infringement, but Oracle must have tried to find real infringement during discovery.

Aurornis17d ago

Can anyone find the actual quote where Blanchard said this?

My understanding was that his claim was that Claude was not looking at the existing source code while writing it.

3 more replies

SpicyLemonZest17d ago

Isn't this a red herring? An API definition is fair use under Google v. Oracle, but the test suite is definitely copyrightable code!

NewsaHackO17d ago

IANAL, but that analogy wouldn't work because Mickey Mouse is a trademark, so it doesn't matter how it is created.

esafak17d ago

2 more replies

re-thc17d ago

> This feels sort of like saying "I just blindly threw paint at that canvas on the wall and

> He fed only the API and the test suite to Claude and asked it

Difference being Claude looked; so not blind. The equivalent is more like I blindly took a photo of it and then used that to...

Technically did look.

1 more reply

babypuncher17d ago

This would make it so relicensing with AI rewrites is essentially impossible unless your goal is to transition the work to be truly public domain.

2 more replies

Gigachad17d ago

robmccoll17d ago

4 more replies

u1hcw9nx17d ago

This was not about legality.

> That question is this: does legal mean legitimate?

Just because something is legal does not mean it's moral thing to do.

1 more reply

Aboutplants17d ago

I’ve often thought that the key to fighting this is through this exact method. Turn the tool against them

martin-t17d ago

They might not care. Products win not by quality or features but by advertisement, hype and network effects.

And that's how they like it.

1 more reply

peacebeard17d ago

2 more replies

GuB-4217d ago

I think it will become interesting when AI will be able to decompile binaries.

1 more reply

amelius17d ago

You will probably run into design patents.

1 more reply

fruitworks17d ago

this is the question of the hour. Imagine using this LLM proxy to license-strip major parts of leaked Windows source code to produce code for WINE.

On top of all of this, there are the attempts at binary decompilation using LLMs and other new tools that have been discussed on this site recently.

kelseyfrog17d ago

In the corporate world, we've started using reimplementation as a way to access tooling that security won't authorize.

Sec has a deny by default policy. Eng has a use-more-AI policy. Any code written in-house is accepted by default. You can see where this is going.

kemitchell17d ago

Not Invented Here's long, slow mutagenic march toward full antibiotic resistance continues apace.

munk-a17d ago

crazygringo17d ago

1) Was any copyrighted material acquired legally (not applicable here), and

2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)

And in this particular case, they confirmed that the new implementation is 98.7% unique.

5 more replies

NewsaHackO17d ago

I agree there has to be a court case about it. I think the current argument, however, is that it is transformative, and therefore falls under fair use.

2 more replies

paxys17d ago

The act of training by itself has been ruled to be fair use over and over again, including for LLMs, and there isn't much debate left there.

The test for infringement is if the output is transformative enough, and that is what NYT vs OpenAI etc. are arguing.

steve_gh16d ago

Is the LLM acting as my agent? If the LLM has been exposed to the source code then have I been exposed to the source code? So in that case is a "clean room" implementation possible?

PaulDavisThe1st17d ago

sigmar17d ago

>then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.

1 more reply

largbae17d ago

This is only worth arguing about because software has value. Putting this in context of a world where the cost of writing code is trending to 0, there are two obvious futures:

sarchertech17d ago

There’s an other option. The cost of copying existing software trends to 0, but the cost of writing new software stays far enough above 0 that it is still relatively expensive.

beepbooptheory17d ago

More and more I am drawn to these kinds of ideas lately, perhaps as a kind of ethical sidestep, but still:

- https://wiki.xxiivv.com/site/permacomputing.html

- https://permacomputing.net/

anonymous_sorry17d ago

2 more replies

casey217d ago

The value of software has never been tied to the cost of writing it, even if you don't distribute it your still breaking the law.

1 more reply

rstuart413316d ago

I came here to say a similar thing.

Welcome to the brave new world.

PS: Sqlite keeping their test suite proprietary is looking like a prescient masterstroke.

PPS: The recent ruling that an API isn't copyrightable just took on a whole new dimension.

foresto17d ago

From the article:

> He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch.

From GPL2:

Is a project's test suite not considered part of its source code? When I make modifications to a project, its test cases are very much a part of that process.

crazygringo17d ago

It's transformative, so no.

Legally, using the tests to help create the reimplementation is fine.

1 more reply

tty45617d ago

Google v. Oracle ruled that use of APIs are fair game and could be argued that test cases are strictly a use of APIs and not implementation.

1 more reply

justinclift17d ago

> The dispute drew responses from two prominent figures in the open source world.

Sure, but neither of those is an IP Lawyer.

The actual IP Lawyer who turned up and tried to engage, Richard Fontana, had his issue closed:

https://github.com/chardet/chardet/issues/334

Richard's point was this (quoted below):

---

FWIW, that case is not really relevant to what we are/were talking about here.

The question is whether you are truly an "author", or whether there was no (human) author.

The general legal consensus has been that generative AI output is not copyrightable (without some special facts of some sort, perhaps).

> If all of this code was somehow not copyrightable because someone wrote a prompt instead of directly editing the code, that would have pretty huge implications.

That's exactly it. Your act of applying the MIT license with your copyright notice to code that you did not "directly edit" has enormous implications.

RcouF1uZ4gsC17d ago

I don't think Fontana's reasoning holds up.

I think it is more like photography.

The case law is that a camera can't own a copyright, but a human can, even though all the pixels were produced by the camera with very little involvement at the pixel level by the human.

2 more replies

ticulatedspline17d ago

Surprised they don't mention Google LLC v. Oracle America, Inc. Seems a bit myopic to condone the general legality while arguing "you can only use it how I like it".

drnick117d ago

It should be noted that the Rust community is also guilty of something similar. That is, porting old GPL programs, typically written in C, to Rust and relicensing them as MIT.

wolvesechoes17d ago

> porting old GPL programs, typically written in C, to Rust and relicensing them as MIT

Everything for memory safety.

phendrenad217d ago

And BSD drivers are "clean-room reimplementations" of the GPL drivers from Linux (but we all know they aren't).

bananamogul17d ago

...and the main distros are enthusiastically adopting them.

The kernel will then be next, though it'll take a longer timeframe.

The GPL just didn't win in the marketplace of ideas.

1 more reply

lukev17d ago

I agree with the thrust of this article, that norms and what we perceive as good or desirable extend considerably beyond the minimum established by law.

But a point that was not made strongly, which highlights this even more, is that this goes in every direction.

integralid17d ago

It goes in one direction only.

Companies can take open-source software and make a proprietary reimplementation. You can't take a proprietary software and make an open source GPL version.

1 more reply

martin-t17d ago

I've been thinking this for over two years, that's why I stopped contributing to open source at that time - my work was only gonna be exploited to make rich people richer regardless of the license.

Crazy that only now we're seeing a bunch of articles coming to the same conclusion now.

I put owning in quotes because ownership should go to the people who did the work.

Buying/selling ownership of both companies and people's work should be illegal just like buying/selling whole humans is. Even if it took thousands of years to get here.

Maybe LLM and mass unemployment of white collar workers will be the wakeup call needed for a reform. Or revolution.

Last time this happened was during the second industrial revolution and that's how communism got popular. We should do better this time because this is the last revolution which might be possible.

wccrawford17d ago

This whole article is just complaining that other people didn't have the discussion he wanted.

Ronacher even acknowledged that it's a different discussion, and not one they were trying to have at the moment.

If you want to have it, have it. Don't blast others for not having it for you.

wizzwizz417d ago

Having this discussion involves blasting others for not considering it. Consider the rest of the paragraph you quoted:

1 more reply

rob7416d ago

miggol17d ago

Wow, it feels like this argument rewired my brain.

crdrost17d ago

BSD-type stuff is very simple because it says "here is this stuff. you can use it as long as you promise not to sue me. I promise not to sue you too."

Very simple.

1 more reply

ineedasername17d ago

skybrian17d ago

With coding agents, it’s likely that you’ll be able to modify apps to your own needs more easily, too. Perhaps plugin systems and an AI that can write plugins for you will become the norm?

jacquesm17d ago

> Was it due to ideology or lower cost or more features?

It was due to access.

AndriyKunitsyn17d ago

ddellacosta17d ago

1 more reply

kazinator17d ago

You can't put a copyright and MIT license on something you generated with AI. It is derived from the work of many unknown, uncredited authors.

Think about it; the license says that copies of the work must be reproduced with the copyright notice and licensing clauses intact. Why would anyone obey that, knowing it came from AI?

Countless instances of such licenses were ignored in the training data.

harshreality17d ago

When learning is sufficiently atomized and recombined, creations cease to be "derived from" in a legal sense.

AI-produced material is inherently not copyrightable, but not because it's a derivative work.

1 more reply

moralestapia17d ago

Courts have already ruled that AI-generated work belongs to the public domain. So, even the MIT license does not apply.

effank17d ago

orthoxerox16d ago

One thing I don't understand is what was so bad about having an LGPL license. You are allowed to do `import chardet` in a MIT-licensed or a proprietary program.

spiffyk16d ago

1 more reply

waterproof16d ago

joshjob4217d ago

duskdozer17d ago

It's just considering "sharing" or "freedom" out an extra step. "Total freedom" results in freedom for those who can protect it and no freedom for those who can't.

dataflow16d ago

Yeah, he lost me there. It's like saying "if you share this you must give me a million dollars -- that's not a restriction, it's a condition!"

zakki17d ago

I guess because our freedom is not unlimited.

bjt17d ago

This is an interesting reversal in itself. If you make the specification protected under copyright, then the whole practice of clean room implementations is invalid.

jillesvangurp16d ago

jongjong16d ago

I've been thinking about the erosion of copyright as well. It's basically making software worthless.

kccqzy17d ago

t4356217d ago

So of course we feel that something wrong has happened even if it's not easy to put one's finger on it.

derangedHorse17d ago

> This is not a restriction on sharing. It is a condition placed on sharing

Also this doesn't make any logical sense. A condition on sharing cannot exist without corresponding restrictions.

[1] https://www.reddit.com/r/Android/comments/mklieg/supreme_cou...

jongjong17d ago

The MIT license makes a clear distinction between code and software. It doesn't cede any rights to the code.

Sleaker17d ago

Edit: looks like an IP lawyer had this exact issue on the GitHub and it was closed.

casey217d ago

If the model wasn't trained on copyleft, if he didn't use a copyleft test suite and if he wasn't the maintainer for years. Clearly the intent here is copyright infringement.

smsm4217d ago

> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch.

metalcrow17d ago

> AI is a more complex machine, but still a machine. If you feed somebody'd work into a machine, what comes out is a derivative work.

1 more reply

makerofthings17d ago

paxys17d ago

You've just described why every SaaS stock has taken a beating in the last 6 months.

1 more reply

j-bos17d ago

> ultimate form of piracy

Nothing was stolen, not even copied, lamest piracy I've heard of.

1 more reply

grahamlee17d ago

anonymous_sorry17d ago

But the LLM contributions would likely be ruled public domain, so AGPL may not be enforceable on these.

armchairhacker17d ago

The point of GPL is to restrict distribution. If there’s already an MIT version, it’s useless.

2 more replies

pu_pe16d ago

If anyone can go in, take a GPL project like chardet and reimplement it using LLMs, then the current maintainer just saved everyone time by making their implementation publicly available.

Our legal framework wasn't built for a situation where reimplementing complex software is trivial, much less almost completely automated.

tzs17d ago

The GPL's conditions are triggered only by distribution. If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms.

Offering as a networked service is not distribution. That was why they had to make AGPL to put conditions on use in networked services.

tzs17d ago

Oops…the first paragraph is a quote from the article but I somehow forgot the “>”.

righthand17d ago

martin-t17d ago

1) Legality and morality are obviously different and unrelated concepts. More people should understand that.

2) Copyright was the wrong mechanism to use for code from the start, LLMs just exposed the issue. The thing to protect shouldn't be creativity, it should be human work - any kind of work.

The person who did the work deserved credit and compensation.

flavionm16d ago

> The person who did the work deserved credit and compensation.

That's the part of the argument in favor of copyright that is inherently flawed.

Work itself doesn't have any intrinsic value, only output does. The scarcity of output is what dictates what is actually valuable.

1 more reply

wvenable17d ago

> If LLMs are not derivative works of the training data then why is so much training data needed?

If you went to school for 12-16 years, that's a lot of training. Does that mean anything you produce is a derivative work?

1 more reply

motbus316d ago

To me, there is a confusion of what "copying" and "using" means.

You can copy the idea and not use the source code. This has been ruled ok many times already and would be quite dangerous if that was not the case.

dleslie17d ago

They didn't anticipate that in less than half a decade we'd have technology that could _rapidly_ reimplement software given a strong functional definition and contract enforcing test suite.

blurbleblurble16d ago

sombragris16d ago

nicole_express17d ago

dathinab17d ago

> this is the point of the "clean room" dance

which is the actual relevant part: they didn't do that dance AFIK

AI is a tool, they set it up to make a non-verbatim copy of a program.

Then they feed it the original software (AFIK).

Which makes it a side by side copy, as in the original source was used as reference to create the new program. Which tend to be seen as derived work even if very different.

IMHO They would have to:

2. then generate a new program from specification, And only from it. No git history, no original source code access, no program access, no shared AI state or anything like that.

Also for the extra mile of legal risk avoidance do both human assisted and use unrelated 3rd parties without inside knowledge for both steps.

3 more replies

svilen_dobrev17d ago

i've been following this for a while.. and the trend for copyright (of any form - books code pictures music whatever) being laundered by reinventing the "same" thing in-some-way.. is kind-of clear.

But what happens with the new things? Has the era of software-making (or creating things at large) finished, and from now on everything will be re-(gurgitated|implemented|polished) old stuff?

Or all goes back to proprietary everything.. Babylon-tower style, noone talks to noone?

edit: another view - is open-source from now on only for resume-building? "see-what-i've-built" style

dwroberts17d ago

One of the things that irks me about this whole thing is, if it’s so clean room and distinct, why make the changes to the existing project? Why not make an entirely new library?

The answer to that, I think, is that the authors wanted to squat an existing successful project and gain a platform from it. Hence we have news cycle discussing it.

Nobody cares about a new library using AI, but squash an existing one with this stuff, and you get attention. It’s the reputation, the GitHub stars, whatever

nicole_express17d ago

I mean, Blanchard was the longtime maintainer of chardet already, and had wanted to relicense it for years. So I think that complicates your picture of "squatting an existing successful project".

Honestly it's a weird test case for this sort of thing. I don't think you'd see an equivalent in most open source projects.

intrasight17d ago

I agree. But you can't copyright goodwill and reputation. Trademark does provide some protection there, right?

danbruc17d ago

panny17d ago

Both sides are wrong on this actually. Computer generated code has no copyright protection.

All this code is public domain. Your employees can publish "your" AI generated code freely and it won't matter how many tokens you spent generating it. It is not covered by copyright.

wiz21c16d ago

If you are 50 years old or more, the computing you were born with (you own the computer, you own the programs) will be gone. Copyleft only makes sense if you own the computer.

That makes me sad.

ball_of_lint17d ago

Would software be more or less free in a world without copyright?

duskdozer17d ago

It will be disproportionately hard to do it to proprietary software though. The imbalance of power is sort of what the GPL was there for

ajross17d ago

duskdozer17d ago

primenum16d ago

stagger8717d ago

alterom16d ago

>I don't see many people talking about this angle, but LLMs ripping off my work killed my open source efforts.

This is exactly what the article is talking about.

t4356217d ago

Why does anyone need his new library? They can do what he did and make their own.

I'm glad we can fork things at a point and thumb our noses at those who wish to cash in on other's work.

warkdarrior17d ago

Why would I make my own? The new library is released under MIT license and faster than the old one.

1 more reply

wvenable17d ago

Consider it an LLM cache. The result has already been cached so you don't have to generate it again.

josalhor17d ago

niemandhier16d ago

Look at sqllite. They have a good example going how code can be open and you still have a product.

For SQLite the actual product is the test-suite and the audits.

Sure you can use the code all you like, but you only ever get past quality gates if you use the audited and provably tested version.

This becomes just more relevant in the age of ai coding, where an agent might be able to reimplement your specs.

Keep your code open, but consider moving your tests.

arjie17d ago

The

wvenable17d ago

I wonder how one proves that the software is machine created.

jpauline16d ago

0x45717d ago

> Antirez does not address this directional difference. He invokes the GNU precedent, but that precedent is a counterexample to his conclusion, not a supporting one.

nickcoffee16d ago

strongpigeon17d ago

I feel like the licenses that suffer the most isn't the GPL, but the ones like SSPL. If your code can be re-implemented easily and legally by AWS using an LLM, why risk publishing it?

It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.

colmmacc17d ago

The four essential freedoms of the Free Software movement are ...

To my mind ... GenAI coding make all of these far more realizable, especially for "normal people", than CopyLeft ever has. Let's go through them ...

Want to run a program as you wish? Great! It's easier than ever to build a replacement. Proprietary or non-free software is just as vulnerable to reimplementation as Copyleft is.

Want to study a how a program works and to modify it? This is now much more achievable.

Want the freedom to redistribute copies to help others? Build your own version! It may not even be copyrightable if it's 100% generated (IANAL).

Want to distribute modified versions? yes! see previous.

I dunno; seems like generative coding can be as much a liberator as any kind of problem.

skydhash17d ago

People will still pay for Matlab, SolidWorks, and Maya because no one who need those will vibe-code a solution. And there’s plenty of good OSS versions for the others.

alterom17d ago

Sorry, but this seems to be so off-base (as well as naively optimistic) that I am having difficulty responding to this.

But I'll try nevertheless.

- >Want to run a program as you wish? Great! It's easier than ever to build a replacement.

Non-sequitur. Building a replacement does nothing for being able to run a program as you wish.

Nobody else is able to run your program as they wish unless you release it with a Copyleft license.

- >Want to study a how a program works and to modify it? This is now much more achievable.

Reverse engineering is more achievable.

Modifying a program, without having its source code, documentation, and a legal right to do so guaranteed by the license is (and always be) easier compared to not having those things.

- >Want the freedom to redistribute copies to help others? Build your own version! It may not even be copyrightable if it's 100% generated (IANAL).

So, that's not about redistributing copies. That's about building an alternative option.

I can download an Ubuntu image and get Libre Office on it with a click.

Go vibe-code me a Microsoft Excel running on Windows 11, please, and tell me it's easier.

- >Want to distribute modified versions? yes! see previous.

You're not even trying here.

One can't legally modify and redistribute copyrighted works without explicit permission to do so.

- >I dunno; seems like generative coding can be as much a liberator as any kind of problem.

Now, that statement I fully agree with.

Generative coding is a liberator as much as any kind of problem is.

Headache, for example, is generally a problem. It's not a great liberator.

Neither is generative coding.

Vibe-coding (like any vibe-writing) simply can't accomplish that, by design.

api17d ago

It also erodes copyright. A decent amount of commercial software can be AI cloned with no copyright violation.

A lot of SaaS too, especially if AI can run a simple deploy.

We might be approaching a huge deflationary catastrophe in the cost of a lot of software. It’s not a catastrophe for the consumer but it is for the industry.

krater2316d ago

waffletower16d ago

I am appreciating the return of the meme "information wants to be free".

ori_b17d ago

If you can prove the LLM was not trained on the code it is reproducing, and has never seen the code as part of a spec to follow, I don't see a problem.

Proving this is going to be hard with current "open source" models.

eschaton17d ago

winstonwinston17d ago

> Blanchard's account is that he never looked at the existing source code directly.

That’s a weird statement while releasing the new version of the same project. Maybe just release it as a new project, chardet-ai v1.0 or whatever.

daemin17d ago

sayrer17d ago

I don't think this part is correct: "If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms."

That's what something like AGPL does.

xbar16d ago

Oracle v Google concluded that APIs could not be protected by either copyright or copyleft. It seemed to me at the time that most here supported that decision. Has anything changed?

vbarrielle16d ago

moralestapia17d ago

That's a non-sequitur. chardet v7 is GPL-derived work (currently in clear violation of the GPL). If xe wanted it to be a different thing xe should've published as such. Simple as.

tw198417d ago

Claude must be trained on chardet already, it worked on chardet's code to optimize or rewrite it to be much better. This is the textbook definition of derivative works.

krater2316d ago

There is fewer then 2% of code a copy of chardet. When the developer of chardet had done it without AI, whats then? He is trained on the same code too.

hungryhobbit17d ago

I largely agree with the author that AI can't just magically remove license agreements by rewriting code.

However, I take issue with his version of history:

>The history of the GPL is the history of licensing tools evolving in response to new forms of exploitation: GPLv2 to GPLv3, then AGPL.

Again, I'm very pro-OSS, but let's not pretend the community has always had a straight line of progress forward; some stuff is crazy Stallman stuff that set us back.

randyrand17d ago

It doesn't matter if it's legitimate. The people that use it don't care. They just find it online and click download. This is the reality.

kanemcgrath17d ago

without discussing copyright, I don't believe any of this is copied. Which I think should be the argument that actually matters.

iberator17d ago

Easy solution for now:

Add something like this to NEW gpl /bsd/mit licenses:

'you are forbidden from reimplementing it with AI'

or just:

'all clones, reimpletetions with ai etc must still be GPL'

keeda17d ago

But the code itself was never the valuable aspect; it was the functionality it provided.

And now AI is making that starkly apparent, while undermining a lot of other presumptions. Including about copyright.

mh226617d ago

Buried in here: Mark Pilgrim suddenly reappearing after his sudden disappearance years ago! Has he been up to anything since then?

mwkaufma17d ago

A lot of untagged IANAL takes here today.

jFriedensreich16d ago

hexyl_C_gut17d ago

I'm less concerned about AI eroding copyleft and more exited about AI eroding copy right.

animitronix17d ago

LPGL is dead, long live the AI rewrites of your barely open source code

throwaway202717d ago

Perhaps software patents may play an even bigger role in the future.

1 more reply

Khaine17d ago

Someone be brave, and do this to ZFS. Poke the Oracle bear!

1 more reply

palata17d ago

> an argument for protecting that test suite and API specification under copyleft terms.

If we protect API under copyright, it makes it easier to prevent interoperability. We obviously do NOT want that. It would give big companies even more power.

Now in the US, the Supreme Court that the output of an LLM is not copyrightable. So even a permissive licence doesn't work for that reimplementation: it should be public domain.

Disclaimer: I am all for copyleft for the code I write, but already without LLMs, one could rewrite a similar project and use the licence they please. LLMs make them faster at that, it's just a fact.

And as an employee writing code for a company. If I produce public domain code because it is written by an LLM, can I publish it, or can the company prevent me from doing it?

internet200017d ago

I stopped reading here:

> The ethical force of that project did not come from its legal permissibility—it came from the direction it was moving, from the fact that it was expanding the commons. That is why people cheered.

How is this not just relitigating GPL vs MIT? By now you know which side of that argument you are in. The AI component is orthogonal.

eduction16d ago

Despite the tech layoffs and rise of AI, programmer hubris is alive and well, that is heartening.

Here we see three engineers writing — at length! — about a hugely complicated matter of law.

No one outside your bubble cares what you think. You are unqualified and your opinions irrelevant. You might as well be debating open heart surgery techniques.

mbgerring17d ago

See also "A Declaration of the Independence of Cyberspace" (https://www.eff.org/cyberspace-independence), and what a goofy, naive, misguided disaster that early internet optimism turned into.

No, AI does not mean the end of either copyright or copyleft, it means that the laws need to catch up. And they should, and they will.

pie_flavor16d ago

While no fan of AI slop, is there any difference between this and musl/busybox/etc, minus the addition of AI? Did anyone get mad at busybox before AI?

mfabbri7717d ago

One thing is certain, however: copyleft licenses will disappear: If I can't control the redistribution of my code (through a GPL or similar license), I choose to develop it in closed source.

1 more reply

youknownothing16d ago

Theseus ship, anyone?

delichon17d ago

throawayonthe17d ago

shall we now have to think about the tradeoffs in adopting

- proprietary

- free

- slop-licensed

software?

1 more reply

logicprog17d ago