Copilot sells code other people wrote (opens in new tab)

(twitter.com)

713 pointsjoemanaco4y ago815 comments

815 comments

242 comments · 97 top-level

HumanReadable4y ago· 35 in thread

Sorry for the unproductive tone of this comment, but there's something about the attitude of this tweet that really grinds my gears.

Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.

I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

'co-pilot just sells code other people wrote' is such a ridiculous understatement of what co-pilot does. Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.

meheleventyone4y ago

They own their code and it either has a license for use or is implicitly rights retained if not. If Copilot regurgitates their code, from a project that is public but with a non-permissive license they are having their IP rights violated so are totally correct in being unhappy about that.

Just because you’ve made something cool doesn’t give you the right to harm others in the process.

If MS or OpenAI don’t think this is the case then they should have also included their private repositories.

9 more replies

lin834y ago

> Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.

Something being cool doesn't exempt it from discussion of its ethics and certainly doesn't exempt it from legal consequences. Often what people call "disruption" is often just exploiting resources/people/their work in unsustainable ways until oversight is introduced.

If CoPilot is copy/pasting large amount of code with unknown licenses, that is a large and real risk for users aside from violating open source projects licenses.

2 more replies

nextaccountic4y ago

> I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

Because they shared the code under a license, and they have the right to complain if people use that code but don't follow the license.

For example, what happens if Github Copilot spits a copy of some copyrighted code verbatim? Is laundering open source code through a machine learning model a loophole for not having to follow the license?

Often following the license is as simple as giving credit to the original author.

1 more reply

highwaylights4y ago

This seems disingenous.

People don’t have a problem that AI is being used in some form to provide the service.

The complaint is pretty clearly that code is being lifted from repositories without attribution or compensation, and being redistributed into other applications.

How impressive the work behind copilot is or is not really isn’t relevant.

2 more replies

hansword4y ago

If I enter 'Mickey Mouse' into an ML-TTI thing like Craiyon (Dall E mini) do you think I will be able to sell the resulting image on a Tshirt?

No, I won't, because Disney has fancy lawyers, the average open source developer hasn't. What you are saying is: Screw little people, let M$ make their money.

Either copyright is for everyone, or for no one. I prefer the latter, but this is not the world we live in.

3 more replies

teakettle424y ago

My code is shared under a license (MIT) that mandates attribution.

That’s all I ask — if you use my code, give me credit.

Stealing my code to train your bot — which will replicate portions verbatim! — is no different whatsoever than the casual plagiarist that copies and pastes a novel snippet manually.

Its absolutely my legal and ethical prerogative to complain about people stealing my code by failing to respect the license under which it was freely provided.

2 more replies

jacquesm4y ago

They are complaining about license violations, they are not pissing on this incredible (is it?) achievement.

Reselling other people's content like this without attribution (which, is a pretty mild form of payment) is not nice. But at least you now have one more reason in the list of reasons why Microsoft acquired Github: to be able to launder their open source contributions and resell them.

matthewmacleod4y ago

I also disagree with the tone of that tweet, but your dismissal is equally shallow and gear-grinding.

There are real, serious, and genuinely interesting issues to be discussed regarding Copilot. It is neither "just selling code that other people wrote", nor is it something that we should applaud merely because it demonstrates "human ingenuity".

The comments here regarding this are honestly a total dumpster fire. It's mostly a bunch of paper-thin hot takes, either:

- The blatantly stupid "you willingly shared your code so why are you complaining that one of the world's biggest companies is now hoovering up code from your carefully-selected open-source license and reselling it as a service!!!"

- The blatantly lying "I have literally never looked at any other computer software while developing any obviously anybody who has ever seen other source code is a plagarist"

It's dumb because there is an actual interesting discussion here but I guess we're not going to bother having it.

1 more reply

Sakos4y ago

I share my code without a license because I want others to be able to see how I solved things. However, this doesn't mean I'm okay with wholesale copying my code. If it's some random guy, then whatever. If it's a corporation like Microsoft, then yeah, I have a problem with it. Under German law, the code is legally not allowed to be reproduced or used without explicit permission even if it doesn't have a license. I retain ownership of it until and unless I explicitly relinquish my ownership rights.

2 more replies

the_gipsy4y ago

> share their code on github where it is publicly available just to complain when others make use of that knowledge

I put a fucking license on it so that it doesn't get abused by some fucking corporation. Jesus Christ, it's not hard to understand.

Mizza4y ago

Pretty fucking simple explanation for it, actually:

I don't make Free software so that Microsoft can sell it to people for use in proprietary projects.

bambax4y ago

The world would probably be a better place if there were no copyright.

But the world we actually live in is one where corporations have copyright, and individuals don't.

That's what irks people, I think rightly.

akagusu4y ago

> I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

People like you should understand that publicly available code doesn't mean "do whatever you want" code.

The majority of publicly available code hosted on Github as a license that tells you what you can and what you cannot do with that code.

If someone uses this code without respecting the license, authors have the right to complain and even legally enforce the license if they want.

Now, you should know that there's nothing "cool" to take other people's work without permission.

DoreenMichele4y ago

Meanwhile, creators of FOSS projects are often underfunded and lots of people are in such dire straits that rich people talk of mollifying them with a few paltry dollars via UBI rather than fix anything.

That's likely the crux of the issue. If you do it right, you can steal from other people and get rich. Meanwhile, those same people (whose work was stolen) may be left out in the cold no matter how original, creative, hardworking etc they are.

Chris20484y ago

> willingly share their code on github where it is publicly available just to complain when others make use of that knowledge

because it's not unconditional, there are often licence terms of usage, and copilot is potentially laundering those.

rglullis4y ago

> why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

For other individuals to collaborate, to make the software available to other people, etc. Certainly not for github's profit and much less for the benefit of github's customers who will have access to open code that violates license agreements.

throwoutway4y ago

I hear you, but this isn’t a “marvel at this free open clever academic thing we built”

It’s a product by a business. Why is that not open to criticism?

rockbruno4y ago

My problem with this conversation is how we can have a 200 comment thread without anyone providing any kind of proof to these claims. Is there any instance of this bot printing an actual copyrighted algorithm instead of a mundane uncopyrighteable piece of logic?

3 more replies

ThePhysicist4y ago

I mean I'm not an expert but it's a valid point as people share code under a given license, and as far as I'm aware Copilot does not make this knowledge available. Nothing to do with the fact that Copilot is an amazing technological achievement.

If I, as a human, go to a public repository on Github and copy/paste a non-trivial 200 line code snippet into my proprietary code base I have to abide by the license of that original code, even if I slightly modify it. I don't see how this cannot be true for Copilot. I'm sure the legal folks at Github have thought of a response though, you could e.g. argue that the snippets produced by Copilot are not affected by the copyright of the original author as they do not reach the required treshold of originality. Seems rather shaky for me though.

pmarreck4y ago

I think copilot is amazing. I don't care what, if any, of my code snippets it uses because I also gain from it by skipping boilerplate (as well as things like bash idiosyncrasies). Using it feels like I am working with dozens of invisible collaborators

hdjjhhvvhga4y ago

> Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.

It is not true. Whenever there is something really useful, everybody is happy, and while of course they always are some nansayers, they're very few.

However, when you do something controversial, you can expect to hear criticism. You are of course free to dismiss that criticism, but when a lot of people are telling you what you are doing is unethical, maybe it's time to stop and think about it.

nixpulvis4y ago

You should read more about peoples ideologies and philosophies of Open Source.

One big reason I support it is because it grants me the right and ability to change things I need/want to change.

ricardoplouis4y ago

Wouldn't you rather have a healthy dose of skepticism and pessimism surrounding new inventions? Even if the negativity is off base, it's far more preferable to a world where everyone is always positive and praises what geniuses the creators are. The former atleast breeds discourse while the latter only serves to make people feel good.

zitterbewegung4y ago

Why can’t startups understand what a open source license is ? Apache 2.0 could be ingested by this tool but it is a horrible license for your database as a service. AGPL would be a great license for a database as a service but should not be ingested by OpenAi / GitHub copilot.

nerdponx4y ago

Both things can be true. It's clear that it violates the licenses of many software projects. But I do agree that denigrating it as "just selling other peoples code" is missing the whole point of the product and of what you pay for when you subscribe to it.

Tryk4y ago

This doesn't address the point of the Tweet, you are simply attacking the form of their argument.

Moreover it is possible to BOTH marvel at the human ingenuity that went into making copilot AND disagree with their methods. Some things can be marvelous and wrong at the same time.

isitmadeofglass4y ago

Yes but,

Sorry for the unproductive tone of this comment, but there's something about the attitude of this tweet that really grinds my gears. Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society. I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge. 'co-pilot just sells code other people wrote' is such a ridiculous understatement of what co-pilot does. Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.

— This comment brought to you by HN-Comment-AI ©

1 more reply

gumby4y ago

People get paid to write code having learned from writing code for others and from reading code others wrote. In this regard I dont see why github copilot is any different.

2 more replies

lobocinza4y ago

Plagiarism isn't new or incredible.

hk13374y ago

Usually they want some recognition for their contribution and with GitHub copilot they get none of that.

sAbakumoff4y ago

It's the negativity bias beauty in action. You have it too.

pwdisswordfish94y ago

‘Facebook just sells personal information of other people’ is such a ridiculous understatement of what Facebook does. Instead of marvelling at the human ingenuity that went into creating surveillance capitalism, they sneer at the audacity of Facebook to do something without first asking their permission.

rambojazz4y ago

Sounds like they're not selling any of your code

1 more reply

B1FF_PSUVM4y ago

> negative nancies

Not bad for everyday use - I like "nattering nabobs of negativism" (as scripted by William Safire), but it is really a bit over the top.

OrwellianTimes4y ago

Fully agreed. It's just people getting mad and jealous but hear me out.

Copilot is NOT SELLING coed other people wrote, it is simply acting as a curator to show you all the solutions people HAVE WRITTEN for free.

Copilot does NOT write entire programs, it's simply an assistant. And there is not much copyright you CAN apply to 3-4 lines of generally understandable code.

I've used Copilot and am actively paying for and I have not seen many cases where it's generating bad code. It's only there to remove boilerplate and common problems, not there to write entire applications.

Why are people getting so salty?

1 more reply

Guid_NewGuid4y ago· 17 in thread

I find this whole topic very annoying, this is like the 3rd variation to reach the front page today. But it has made me realize why I instinctively dislike Free Software as a movement.

Copyright and licensing are bad, actually. Stop getting worked up about the idea of using courts to punish theft. Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.

Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.

Free the code.

sirsinsalot4y ago

"A commons of knowledge is a public good."

Yes but this copilot model takes that, adds value and doesn't itself join the public common good. Instead it takes it, and makes you pay to have it back in another form.

If copilot were open source and the model released for the public good, being built of public data (in your scenario) we would have a very different conversation.

7 more replies

monocasa4y ago

The issue is that whether the free software people want it or not, the copyright system over code exists, and historically has been used as a cudgel against smaller players. If we got rid of copyright over code entirely I'd totally be down for this. And IIRC RMS has said the same thing; that he'd be in favor of the removal of copyright over code as a concept even if it meant neutering the protections of the GPL.

Until that happens, and copyright protections are still used by larger entities, using the same system to protect yourself and (more importantly) your users isn't turning your back on your ideals, but instead simply adjusting your strategy to the current material conditions. Remember that Google v. Oracle (while ultimately a win versus what could have been) was a step back, with de minimis claims left on the table as not a valid defense. The play field is heavily slanted towards the big players and software freedom requires every tool it can put it's hands on at the moment.

4 more replies

marpstar4y ago

> Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps

This is my feeling as well. I don't build stuff in the open so that I can get bent out of shape at someone not properly licensing it. It's in a public repository, FFS... I assume that if anyone even notices my repo, that they may copy/paste a few lines out of my solution if it helps them.

2 more replies

georgeecollins4y ago

You may not care about licensing or copyright, and I imagine many others who create code under an attribution license don't. That's still not the same as saying "copyright and licensing are bad." Too many businesses depend on them to exist for me to have that opinion.

If an AI takes a copyright work and makes its own version-- say combining two novels by popular authors in a way that is unique but keeps large parts of the text intact, can I sell that? I think if I were the authors I would be unhappy.

Also, how hard would it be for copilot to include a comment saying "// I got this line from x repo" when you are copying from a new repo? I am guessing not hard at all. Then at least the user would be aware of where their code was coming from and could be expected to make a judgement. If the line is "let a = b" then probably no worries. But if it is hundreds of lines of a simulation, all from the same repo with no changes, then I think some attribution is good for both parties.

1 more reply

bayindirh4y ago

> I find this whole topic very annoying, this is like the 3rd variation to reach the front page today.

Me too. I also find three iterations of the same subject not enough discourse. We need to take this matter more seriously.

> But it has made me realize why I instinctively dislike Free Software as a movement.

On the other hand, this whole discourse reminds me why I absolutely love Free Software as a movement.

> Copyright and licensing are bad, actually.

This is why we have "Copyleft".

> Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.

And, stop getting into frenzy of arousal about being able to use any and every code piece you see elsewhere in any project regardless of its license.

> Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

This is why GPL is important. It forces knowledge to evolve in the open, stay in the public domain and help it actually makes public good. It also doesn't hinder ambition and vision by not taking it to private domain, and keeping it open to everyone.

> Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

You might be pretending to care about this in your daily job, but we really care. Some of the projects I take part can't ever include GPL code (because the projects are MIT licensed). These texts are court-tested licenses, so they're as proper and serious agreements as the EULAs of "particular" software companies.

> Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.

If I want my code to be copied and possibly closed, I'll license it with MIT or BSD-0 and forget about it, but if I'm licensing my code with GPL3, it means I want that code to stay open. As a license, I expect anyone using that code to respect that license.

> Free the code.

Yes, and respect the license the author selected for his/her code.

1 more reply

mplanchard4y ago

If that's what you want, you should license your code not under MIT, but under a license that allows replication/distribution without attribution. Meanwhile, others who do care about such things can license their code under licenses that require attribution/copyleft/etc.

1 more reply

notacoward4y ago

I suggest you read up on the history of free software and open source. It exists as a reaction to intellectual enclosure, to prevent that ill and create greater freedom of ideas. Yes, it uses the tools of copyright to fight greater ills of copyright, because those are the tools available, and actions like these are necessary to keep the enclosure from happening all over again. Anyone who has actually studied the matter for even five minutes can see how silly the "free software is anti-freedom" FUD is.

vajow462674y ago

So glad this sentiment is becoming more common in the OSS community! I MIT license everything, if someone wants to make money using stuff I wrote that's awesome, and I wish them the best.

I don't think users owe me anything at all. If people want to PR back that's cool but if not that's cool too.

eikenberry4y ago

There is a license for that, the MIT-0 or the MIT No Attribution License.

https://opensource.org/licenses/MIT-0

wcoenen4y ago

> I want a truly public domain license

I think this sentence contradicts itself.

A "license" implies that there is a copyright holder who allows usage of the work under the terms of said license.

While "Public domain" implies that there is no copyright holder (e.g. because the copyright expired, was explicitly waived, or is for some other reason not applicable).

If you want to put your work in the public domain, you can do so; simply include a note saying that you dedicate it to the public domain.

1 more reply

nonbirithm4y ago

I think because this kind of ML is so new, we have no choice but to frame arguments for/against in terms of the structures that have been in place for decades past (copyright, open source licenses). We don't yet have the legal language to express dissent against ML in clear yes or no terms.

I think if there were an option to add a machine learning clause and ask individual creators if they wanted it applied in that context, we would see a considerable amount of uptake. It's just that we couldn't forsee this progress happening so soon, and the issue is still not visible enough. I think it's only a matter of time before the culture catches up and new creative works in the coming years are excluded from training sets by their authors with clear and direct language.

By that point there would be no way to argue "but they shouldn't care, they licensed it like this, so I'm assuming it's fine for ML use."

If copyright is not enough to stop another entity from using a person's data for training, then some other protection should be invented that does.

1 more reply

Schroedingersat4y ago

The problem with this is 'freeing the code' in this instance leads to microsoft building a wall around it and asserting complete control in a few years.

Copyleft exists for a reason and without the ongoing fight for the commons we lose it all.

nmfisher4y ago

I totally agree, this reaction seems very hypocritical. If some rinky dink startup did exactly the same thing - as they are entitled to do under the licences of huge swathes of code on GitHub - hardly anyone would bat an eyelid. But just because it’s a Microsoft-owned company, it’s somehow verboten?

That seems totally inconsistent with decades of people clamouring for more openness/liberty when it comes to IP rights.

1 more reply

progman324y ago

I see the free software movement as a variant on your ideals but rooted in practicality given the current environment.

1 more reply

kube-system4y ago

> Free Software

> public domain

These are incompatible concepts. RMS's vision of 'free-as-in-freedom' software doesn't let people do whatever they want. It forces those who distribute binaries to also distribute source. This is not possible with a public domain work.

futureshock4y ago

In this thread: many engineers nervously sweating. The moats are drying up and the wizards are about to be thrown out of the castle. This tech is the first product in a long line of products that will massively lower the barrier to entry. It has been a good run, but it was never going to last forever. We are not part of the capitalist class and were never going to be.

4 more replies

ssalka4y ago

Information wants to be free

nickjj4y ago· 9 in thread

This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

It feels morally wrong to me that I can spend thousands of hours working on projects on my own free will but then a company can sell the code I wrote to others in the form of snippet completion as a service. In fact they end up selling your code back to yourself if you plan to use the service.

If the answer is no, that moves the needle pretty far in the direction where I'd at least consider the idea of moving all of my repos to Gitlab. I don't care much about stars or popularity. I open source things that are interesting and useful to me and if other folks want to use it they can but I don't gain motivation from others using the projects I release. I like Github and its UI and it's no doubt "the spot" for open source but selling code written by others rubs me the wrong way a lot. It stinks because it also means no longer contributing to other code bases too. It's moving us in the opposite direction of what open source is about.

kemiller4y ago

This is a really good point that I hadn't considered before. It's facebook all over again — selling your own content back to you. Repo owners should be at least compensated when their code gets used. That would be an incredible market.

2 more replies

PaulKeeble4y ago

It should be automatic based on license. GPL code definitely shouldn't be included but MIT could be. They already have this information in most repositories and if its missing they have no right to use it at all. We don't need extra options the licenses already restrict the use and derivative work.

2 more replies

ellyagg4y ago

Well, I hope your viewpoint doesn't win the day, because making code as freely shareable and remixable as possible is a huge boon for humanity.

5 more replies

throwaheyy4y ago

The Twitter thread’s title seems unnecessarily incendiary and clickbaity.

I don’t buy that producing/synthesizing code snippets based off public repos is a problem.

There’s nothing proprietary or original about eg. the syntax of a for-loop, or the boilerplate of setting up some JS framework MVC.

Besides, it’s basically just a (semantic and contextual) search engine inlined within the IDE. Copyright infringement hasn’t taken place until the user activated the autocompletion and actually placed the code within their own and released their code containing the infringing code.

1 more reply

lbhdc4y ago

I stopped publishing open source after all this started coming out because I was so uncomfortable with it.

jaywalk4y ago

If your code is using a license that allows it, how could you possibly opt-out aside from using a different license?

5 more replies

ghostbrainalpha4y ago

It would be kind of cool if Github could show some stat that code you wrote has been used 50,000 times for 12,000 people.

Being a top CoPilot contributor should at least have value to signal on your resume.

dragonwriter4y ago

> This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

I don't think there is a way to opt out if it is a public repo regardless of license, and Microsoft's copyright theory suggests that they wouldn't feel obligate to enxclude any code they got their hands on except under a specific NDA preventing such use; the use of public GitHub repos isn't based on legal constraints but practical convenience.

invig4y ago

They’re not selling you code. They’re selling you an engine that helps you find the right free code at the right time.

If you read free code yourself it’s fine, but if a machine does it for you it’s not? We overvalue humans.

1 more reply

VoodooJuJu4y ago· 9 in thread

It is now proven that copilot returns code from codebases with non-permissive licenses [1].

I'm curious - what are the legal implications of this going forward? I've so many questions.

1. Will Microsoft ever face lawsuits for these license violations?

2. If so, who/how? Class-action?

3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.

4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?

[1] https://news.ycombinator.com/item?id=27710287 | Copilot regurgitating Quake code

mhaymo4y ago

That regurgitated code exists on Github exists under an MIT license: https://github.com/jethrodaniel/fast_inv_sqrt

"jethrodaniel" does not appear to have the copyright to offer that license, but it's hard for Github to determine that in general, so I doubt they would be liable for the error.

4 more replies

5424584y ago

IANAL. My understanding is that the general legal precedent in the US is that a) datamining text has no copyright implications (in the same way that reading a book has no copyright implications) and b) it is not a copyright violation to use a small amount of copyrighted material provided the context is sufficiently transformative. This might seem silly or unfair to you, but that is the current legal reality.

But even ignoring that, everybody uploading code to GitHub has given GitHub the right to analyze that code as per the GitHub ToS. This is the same mechanism by which you can't upload code to GitHub with a license that says "nobody is allowed to display this code on the internet" and then sue GitHub.

1 more reply

concordDance4y ago

There's also one more question:

5. Even if it is illegal, is it actually bad? No one can possibly sell code snippets, the transaction costs are many orders of magnitude greater than any reasonable price. In my opinion, at least in this case the benefits massively outweigh the costs and the law should not apply here.

4 more replies

pwdisswordfish94y ago

Is there any leaked Microsoft code on GitHub? Someone should check if Copilot regurgitates that as well, then see how Microsoft reacts when someone slaps an AGPL license on that…

1 more reply

rifty4y ago

It seems like Microsoft could be in the clear on the basis of it being essentially "search". But it also seems like anyone who uses it could be risking to a high degree getting infected with copyright violating code.

My question is, if it isn't a copyright infringement issue to use copilot in its current form right now, why not just claim copilot was used whenever accused of copyright infringement hence forth?

2 more replies

Beltalowda4y ago

> It is now proven that copilot returns code from codebases with non-permissive licenses [1].

That same Quake example from last year is repeated every single time.

Aside from the fact that GitHub has since added a protection for this, that this example gets repeated time and time again instead of a *list of examples leads me to believe this is (and was not) a common occurrence.

blihp4y ago

1) Most likely

2) TBD

3) Not likely. Worst case a judgement will go against them, they'll effectively pay a fine and then they'll retrain it on a more restricted set of source code.

4) OSS has a pretty tragic history re: enforcement. It wins nearly every skirmish but has no interest in the war so from a big picture standpoint, it loses due to apathy.

bastardoperator4y ago

You don't think a mountain of MSFT lawyers in every state, including partner law firms around the world haven't thought about this? Do you practice law or are you speculating based on emotions?

1 more reply

throwaway232344y ago

Big meh. That quake code was MIT.

1 more reply

parhamn4y ago· 9 in thread

Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art.

No book, painting, codebase, sonnet, design is theft-less.

The art is the space reduction, otherwise we’d just bruteforce away.

mihaic4y ago

This type of argument always distracts from the fact that figuring out where we draw the line between theft and reimagining.

The Magnificent Seven for instance was a reworking of Seven Samurai, but stands on its own as an original creation. Going into a cinema and filming a picture to later put on a torrent site is not artistic reworking.

The hard discussion is about what is acceptable, we all know prior art exists.

3 more replies

izacus4y ago

> Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art

If that happens, the big copyright/IP conglomerates will immediately jump on that and make sure that laws are adjusted and they get their cut of every single word and line anyone puts near their smartphones ;)

pera4y ago

I'm not sure what do you mean by "theft-less" but I believe you might be conflating inspiration with derivative work: Copilot can produce verbatim copies of open-source code, this would make it more similar to how some musicians sample other people's music to create new music.

wnkrshm4y ago

So the only thing left is handiwork I guess. Engineering isn't different from art in any way, the constraints are just stricter.

stemlord3y ago

>Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art.

That applies to everything, its even a basic law of physics, and there's absolutely nothing wrong with it. Any layperson already knows what a remix is anyway so not sure what you think will change

natly4y ago

Unless every invention is gonna be AI generated (which is kind of a scary situation), intellectual property still needs to be a thing (otherwise people won't have incentive to invent, it'll just be stolen from them).

3 more replies

Chris20484y ago

Is it really "just" that? Is there no original creativity in the choices (and skill) in the blending, and choosing what (and how) to blend?

Would you describe a parody, or a critique/review, as equally without original merit?

lioeters4y ago

Agamus4y ago

This idea has been around for a while - why... "pretty soon"?

And I'm sure I couldn't disagree with you more. Or are 'influence' and 'theft' the same now?

2 more replies

mojuba4y ago· 9 in thread

Can I suggest a hypothesis that if you find Copilot useful it means the problem you are solving is a boring one? I might be wrong of course.

alpaca1284y ago

I disagree. Most large projects, software or otherwise, use existing parts. If you design an innovative device you'll still use some standard components like chips, memory modules etc.

There's already a way to quickly solve the boring parts in development - libraries which were built and licensed around that purpose. But Copilot passes you code of unknown origin, with unknown license terms and no information about how close it is to an existing codebase. It's like a person trying to sell you Macbooks for a hundred bucks per unit but you don't know where they came from and who made the holiday photos stored on the harddrive.

alkonaut4y ago

99% of the "problems" I'm solving when I'm working even on very interesting and challenging problems, are boring subproblems. If I can get those out of the way then that would be great.

viraptor4y ago

The most interesting problem will have extremely boring bits. If you write a cli tool to solve all of world problems by changeling magic, you'll still need to add the parameter handling and do some error management. Which is repetitive and likely well generalised and predictable based on other projects.

mistercow4y ago

That hypothesis is easily disproven by spending an afternoon on a side project with Copilot.

No matter how interesting your problem is, translating it into code is going to involve a lot of grunt work. This isn’t just boilerplate, but also the large portion of your code which is going to be gluing things together.

The time you spend working through those menial parts of your code is time when the context of the interesting part of the problem fades. Once you get the mechanical stuff out of the way, you have to load the interesting stuff back into your brain.

This is where AI coding tools really shine. They dramatically reduce the intervals between when you can think about the actual problem you’re solving by letting you get the boring mechanics out of the way more quickly.

1 more reply

triknomeister4y ago

99% of work in 100% of interesting projects is boring.

para_parolu4y ago

The problem may not be boring. Typing boilerplate code is. I work on games as hobby. Sometimes I implement mechanics requiring vector math. Working on mechanics is interesting. Writing down math is not. Copilot helps with later.

1 more reply

workingon4y ago

Seems like a narrow vision. Is every line of code you write to solve a problem “not boring”? I solve problems I find interesting, but writing matplotlib code to visualize data never is.

trention4y ago

This is true for the current iteration of the model. Probably won't be true at least to an extent in 5 years. Besides, there is nothing wrong with solving boring problems. Not everyone can be Bjarne Stroustrup.

muzani4y ago

Yeah, it's for boring problems. Drawing a circle or detecting a specific format of number in some string, for example.

spupe4y ago· 8 in thread

If you assigned a task to a junior dev, and he/she used some code from open source projects and Stack Overflow to develop a custom program for the task, would you say that this person is selling you other people's code? Is it common or expected for this type of use to be acknowledged?

XCabbage4y ago

People I've worked with have different philosophies on this, but personally, if you check in code that is distinctive enough that I can identify the source you copied and pasted it from, and you provided no indication (whether in a comment or a PR description) that you copied it, I will really get quite grumpy at you about it.

Way too often I burn half an hour needlessly during review in one of two ways:

* trying to figure out how the heck someone figured out some "magic" code that achieves something by invoking a bunch of poorly documented library or framework internals, and trying to reverse engineer WTF all the magic does by diving into the framework's source... only to eventually think to google the whole snippet rather than each individual method call, and discover it's copied from a Stack Overflow answer

* trying to figure out why something was written in an unidiomatic or overcomplicated way rather than a more obvious approach, and commenting at length on how I'd simplify it... only to eventually realise it was copied from a Stack Overflow answer

Attribution isn't just about making sure the right person gets credit, or about license compliance; reviewers and maintainers frequently need to be able to see where stuff was copied and pasted from in order to do their jobs effectively, even for snippets of just a few lines.

1 more reply

genezeta4y ago

About 10 years ago or so, I was working at a certain place. They put me into a small team apparently focused on some R+D project under the direction of an "architect".

Basically, the project was to package Cordova + Backbone + Marionette, plus a couple of tools, under their own commercial name. Then they'd go around potential clients presenting it as the perfect solution to build hybrid applications for web/mobile/smartTV/whatever.

A certain Monday, the "architect" arrived boasting. He did that often, but this time he was more boastful. He explained that he had spent the whole weekend coding. He had written an incredible tool that would create a skeleton for a project from zero. You would type something like `tool create` and it would create the whole project with all the scripts and some example views and whatnot.

It was Yeoman's yo CLI tool, of course. He had just changed the copyright in the comments, removed most of the comments, he had deleted any mention to yeoman or the original creators, changed the name of the executable script and that's it.

The whole thing was OS code picked up from various repos and packaged as their own. The company used it to sell development projects. The so-called-architect used it to sell himself inside the company and then jump away into a startup as CTO.

Is this common or is it just anecdata? I don't know. It's clearly not the only time I've seen something like this and I do know that in certain companies around here it isn't exactly uncommon. But I can't say how common or uncommon it is.

Would I call this "selling other people's code"? Yes, I would.

1 more reply

whatatita4y ago

If the solution was made up of ideas from OSS and snippets from Stack Overflow? No; that's fine.

If the solution was copied from an OSS project without proper attribution? Yes. Absolutely. And they'd have words with a senior dev and maybe even legal if the code they copied made its way into production without attribution.

Many copyleft OSS licenses require attribution and distribution of derivative works that we wouldn't allow.

mbreese4y ago

It depends on the source of that code and the expected license of the code you paid them for. If everything is MIT/BSD (and attributed), no problem. If the code was GPL and I’m making a commercial product, we have an issue.

I’d also expect for any stack overflow code to include a comment with a link to the stack overflow page.

I think one of the key points is to make sure any code taken from another source is cited appropriately. If it isn’t, or the junior dev is passing it off as their own work, then we have problems.

ben-schaaf4y ago

If I found out a junior dev had been copying copy-left or proprietary code then I'd have to rip out that code, have a chat with them and figure out what to do from there. Even if the code isn't copy-left it's still someone else's code, sometimes that's ok but sometimes it's definitely not.

jhugo4y ago

No matter how complex a program is, and no matter whether it uses techniques sometimes described as "AI" in its implementation, it's not a person. Copilot is just a very complex pipeline from other people's code to your editor, which ignores the license of those other people's code.

thelastbender124y ago

This is a good thought exercise. I wouldn't call it stealing, though I am not sure how legal liability is assessed, say if they picked up GPL code unknown to the company, and the company is later sued over it.

This isn't derived from principled reasoning, but I think of it as similar to community norms. Not the best example, but you wouldn't mind someone subletting their homes to Airbnb, but if all of your apartment complex does it, it invites regulation. A product like copilot enables copying code (even if inspired, and not verbatim) at a scale that individual developers can't. So respecting software licenses needs to be codified (legally?) while previously it was left unmonitored.

trention4y ago

It's absolutely fine to allow humans to do that while prohibiting (commercialized) AI to do the same thing.

1 more reply

antihero4y ago· 6 in thread

I mean, if it's autocompleting a fairly simple line, and can do that because it's analysed a lot of lines, I don't really see that as "stealing anything".

If you are using it to write whole complex functions thatare the same as other people's, I guess that is copying.

But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

I think the first use case is far more common, and creating boilerplate that is so generic you could never really attribute it anyway.

rob744y ago

> But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

If you do that on your own, it's your (legal) responsibility. If Copilot does it for you, it's GitHub's/Microsoft's responsibility.

4 more replies

afiori4y ago

The when Oracle won its copyright lawsuit against google it was because of a 8 line bound checking utility function.

3 more replies

alpaca1284y ago

The first can be automated without ML though. And once you use ML you cannot guarantee it won't copy-paste existing code.

This whole thing would be fine if GitHub hadn't just used all public code on their platform, ignoring all involved licenses.

2 more replies

dobin4y ago

I neither see it "stealing". The neuronal network was trained with code as input. It's creating code as output. The output has nothing to do with the input once it is trained. Do people dont know how neuronal network work?

It's like saying GPT-3 created text is copyright infringement, because some author used the same sentence in a book before.

4 more replies

carom4y ago

My problem is with the weights not being released. They are a derivative work of open source code in the most literal sense. The weights would not exist without those lines. Gradient descent is using literal derivatives.

wodenokoto4y ago

> If you are using it to write whole complex functions thatare the same as other people's, I guess that is copying.

> But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

How would I know that the boiler plate I ask copilot to write for me is copied verbertim from a codebase, that neither I nor Microsoft has licensed to use?

borishn4y ago· 6 in thread

Copilot is fair use, get over it!

Copilot is not writing your code any more that Google search is writing your code. You are writing your code, and Copilot is just making suggestions.

US constitution secures limited copyright to "To promote the progress of science and useful arts". Copilot is just that, get over it!

Buttons8404y ago

A good and well argued opinion made hostile by saying "get over it" twice! Saying "get over it" discourages further discussion. Your comment would be better without it.

2 more replies

nescioquid4y ago

Not an expert, but fair use generally covers education, criticism, parody, and satire. There is a test for meeting fair use and it includes things like amount copied and commercial or non-profit interest.

The amount copied from any particular source might be small, but an aggregate strip-mining of many copyrighted sources is an interesting twist. Another might be, as you suggest, it might be a machine that itself does not violate copyright, but has the effect of causing users (who accept the suggestions) to violate copyright.

1 more reply

brianmcc4y ago

Wait till it suggests something Disney can argue they own rights to...

2 more replies

zerocrates4y ago

Yes, the copyright clause gives as its purpose "the progress of Science," but that doesn't mean that anything which claims to be "progress" gets a free pass.

1 more reply

jazzyjackson4y ago

Personally I think I'll just claim all the code I write with co-pilot is a parody.

humanwhosits4y ago

Citation needed for copilot being fair-use

ThereIsNoWorry4y ago· 6 in thread

1. You most likely agreed to that by using GitHub.

2. Copy&Pasting Code by manual search exists.

3. This is just a smart tool so you don't have to figure out yourself what to copy&paste (in the best case) and save a lot of time.

Sometimes I truly wonder how people can genuinely be upset about things like this. What is broken are copyright and patent laws in the 21st century.

keraf4y ago

The point of this Tweet is about licensing. When using an MIT licensed library for example, you would have to give attribution. But you can easily rewrite portion of that library yourself using Copilot, which could potentially use code from the initial lib, without any attribution or whatsoever. It's even more problematic with licenses such as the GPL.

I guess Copilot could address this by checking the licenses of the projects it uses. Even when combining code, it could pull in the required attribution or avoid GPL licensed code (unless enabled) for example.

teakettle424y ago

> Sometimes I truly wonder how people can genuinely be upset about things like this.

Tell me you regularly plagiarize without telling me you regularly plagiarize.

1 more reply

zufallsheld4y ago

As to your first point, there are many repositories on github that the author of code did not upload there or where not all contributors to the code are on github or agreed to let their work be used in such a case.

1 more reply

SahAssar4y ago

> 1. You most likely agreed to that by using GitHub.

Are you saying that I would need all the original authors consent to upload a repo to github even if I include all the original attribution and licenses? Because what you are implying is that when uploading I'm granting github a license far outside the bounds of the license included, which only all the contributors can do. For example, would the linux project need to contact each and every contributor ever to upload a mirror to github, since their contributions were under GPL but you are implying that the license given to github is much, much broader?

This would make any project not originally started on github and with a few contributors basically impossible to host there.

> 2. Copy&Pasting Code by manual search exists.

The question is who is doing the infringement here. Github copilot is obfuscating the copying and telling it's users that the code is theirs to use, own, etc. as they please but is also taking large chunks of code it does not have the right to redistribute, even less grant licenses to.

IdiocyInAction4y ago

I don't think that something like CoPilot is what most GH users had in mind when they published their code. Also, licenses exist (which CP demonstrably doesn't give a shit about).

dmix4y ago

> Sometimes I truly wonder how people can genuinely be upset about things like this

90% of Twitter is just inventing new ways to whine about things

1 more reply

coldtea4y ago· 4 in thread

>Hector Martin: If you use Copilot, you are basically playing Russian Roulette that the random mashup of existing, copyrighted, hegerogenously licensed code that you get out of it qualifies as an original work, mostly by chance. Or that nobody will ever sue you otherwise.

Well, that's already the case with Stack Overflow copypasta enterprise code. If anything, use of Copilot would be an improvement...

tagyro4y ago

Do people really copy/paste from StackOverflow?

I feel this is more a meme, rather than reality. I do check StackOverflow, but never have I took an answer verbatim. I try to see if it's the same problem and what was the approach in deconstructing it, which I find more useful in the long run.

7 more replies

Hamuko4y ago

If you post content on Stack Overflow, your contribution is distributed using the CC BY-SA 4.0 license.

1 more reply

t0suj44y ago

That quote applies to any creative work. Be it code, audio or video.

1 more reply

moffkalast4y ago

> If anything, use of Copilot would be an improvement

What do you mean, Copilot regularly pastes stuff directly from SO. One of those automatic doc generators was able to point me to the exact answer where one of them was from.

1 more reply

bborud4y ago· 3 in thread

My personal reasons for not using copilot are a bit simpler. I believe the act of researching which solutions to use for a given problem is not so much about time, or the code you end up with, but about developing a better understanding of what you are doing. You may end up just cutting, pasting and modifying a piece of code you found, but hopefully, you were exposed to a few different ways to accomplish the same thing, and it made you aware of other choices that could have been made.

You could think of the evolution of practical problem solving in software engineering like this:

1. I have to invent a solution (because nobody else in the world has a computer) 2. I have to know of a solution (education, word of mouth...) 3. I have to look up a solution in the books I have (commoditized knowledge) 4. I can look up solutions on the internet <-- (we are here) 5. The computer suggests something and I accept (some are here too)

From 1 to 4 the amount of cleverness required to solve small problems drops a bit, but your productivity and exposure to knowledge probably goes up.

I'm not quite sure what happens from 4 to 5. Personally I'm actually more interested in the context solutions are presented in than just the solution. In fact, I rarely copy and paste code from the Internet, but I often look at multiple suggestions/solutions and then borrow ideas or combine ideas from several sources.

Yenrabbit4y ago

At least the way I use it, it's not taking much away from my problem solving. It's just that instead of having to type `particlesGeometry.setAttribute('position', new THREE.BufferAttribute(positions, 3))` I just write `//Add as an attribute` and then hit TAB, since Copilot is smart enough to see that I've just prepared some geometry and populated an array of positions (both operations also sped up by not having to type the obvious bits). You're still having to think through the solutions (I'm not just typing '//make a cool particle sim') but no longer need to hit SO every few minutes for syntax examples when using a new library or something.

2 more replies

ok1234564y ago

It replaces a few google searches to look up how to do something with a new language or library. Keeping you in your editor and from having to context switch, and possibly distract/derail you, is worth it.

kraftman4y ago

I would be interested to know how many people are actually using copilot to generate entire chunks of code that they don't understand. For me it's just autocomplete on steroids, its not answering any questions I don't know the answer to (other than syntax ive forgotten), it's just making the boilerplate faster to write so I can think about the actual problem I need to solve.

1 more reply

wolframhempel4y ago· 3 in thread

When my last company got acquired, part of the due diligence process was a scan of our codebase for snippets from stack overflow. Every snippet found that wasn't posted with a clear license by the author was challenged and we rewrote it.

Now, I'm not entirely sure how necessary this was from a legal perspective. But introducing an AI into the mix will bring up a lot of uncertainty when it comes to how much change is required for something to no longer be considered a copy/derivative.

dmortin4y ago

Did the scan find the process if they changed the variable names, for example? Or is that considered a differing snippet then?

1 more reply

redox994y ago

Isn't all stack overflow content creative commons?

https://stackoverflow.com/help/licensing

1 more reply

dmix4y ago

That sounds like legal paranoia or a make-work program.

yaseer4y ago· 3 in thread

Technically, programmers search, copy and modify code all the time.

One might argue copilot puts into software an algorithm that humans are already doing. Software like that is usually inevitable.

Still, it sucks there's no benefit for the contributors.

The most ethical thing I can think of is some kinda 'Spotify-like' revenue sharing model, based on how often their code is used by others. Not that they'd ever implement that if they can get away with it!

omnicognate4y ago

> One might argue copilot puts into software an algorithm that humans are already doing.

That argument only works if you think what Copilot is doing is meaningfully similar to what humans are doing. The debate about how these models relate to human thought might have legal implications.

As I understand it (IANAL) copyright doesn't protect ideas and concepts. It protects the content itself. In theory, if I read some copyrighted work, understand some idea in it and then create a new work using that idea, without copying that original work, then that is not a derivative work. (I think this is at least how it's supposed to work - would love to be corrected if that's wrong.)

So if I took a copyright work and rot-13ed it before distributing copies, I think that would be clear copyright violation, but if I made my own works using concepts I gleaned from reading it, it wouldn't be.

So should Copilot be treated like the rot13 algorithm or like me understanding concepts and generating new works using them? That sounds like a fascinating legal debate to be had.

teakettle424y ago

> Technically, programmers search, copy and modify code all the time.

When following the license terms, preserving the original copyright, etc, sure.

However, honest, ethical people (including programmers) do not plagiarize.

Copying and pasting code without attribution is plagiarism. Doing it without following the licensing terms is a copyright violation.

1 more reply

kaibee4y ago

> The most ethical thing I can think of is some kinda 'Spotify-like' revenue sharing model, based on how often their code is used by others. Not that they'd ever implement that if they can get away with it!

Based on my understanding of how NNs work, I'm not sure its even possible to implement something like that.

seydor4y ago· 3 in thread

Programmers are fine when their creations, pretty much all of tech, resells content that other people wrote for free, but no, not code, that one must be expensive

onpensionsterm4y ago

The only one making money here is github. Very few programmers are selling open source code. And programmers are (in)famous for not buying software.

anonymoushn4y ago

I also don't think it's acceptable for TurnItIn to monetize content without paying the authors. My opinion about whether students should have their work stolen and monetized by a company doesn't seem to have much impact though.

zx80804y ago

%s/programmers/tech capitalists/g

shireboy4y ago· 2 in thread

I do feel these arguments are valid if a little overstated. Most devs have googled, found some code, and pasted it in without thinking about attribution. Doesn’t make it right, but it is a question of how much code is being copied and how specific. For example, I peruse open repos to learn - I learned about the spread operator in JavaScript that way- doesn’t mean every time I use it I need to attribute whatever repo I saw it in. But, yeah, if I copied a larger chunk and the owner wants attribution, probably wrong.

I like the idea of having the bot automatically update a attribution file if it detects it’s used licensed code. Seems like it would be fairly trivial. Also a robots.txt for repo owners to control automated use.

Also, they should totally pay back a portion of revenue to the community and support the repos used to train. That seems like it would be a good PR move if nothing else.

kachhalimbu4y ago

I like this take. Copilot to me seems a glorified (very intelligent) auto-search-paste/autocomplete service. It is just mimicing what usual devs do which is to copy-paste code from StackOverflow/github for many mundane types of codes like for loops, mongo find queries, callback func definitions etc for JS devs for eg.

The idea of auto-attribution if copilot surfaces licensed code is best because then it keeps the copilot user honest where the code is coming from and honor the original license.

1 more reply

Aeolun4y ago

> Also, they should totally pay back a portion of revenue to the community and support the repos used to train.

Aren’t they already doubling all Github sponsorship money?

1 more reply

JacobiX4y ago· 2 in thread

It’s the same problem with those ML models, the other day someone generated a children’s book using GPT3, turned out that there is a real children's book with the same name and a very similar content: The Very Lonely Firefly by Eric Carle.

bartq4y ago

Other thing I'm worried about: how to retract facts from ML model? I guess it's impossible, you need to retrain from scratch with part X removed from training set. Or... people could invent layered ML models similar to docker - each layer would be marked what data it was trained with. Then at least you'd have some cache of trained model to re-use in next training session. Nasty stuff.

1 more reply

icoder4y ago

Interesting, it's a big question I've had for a while, how 'original' stuff coming from these AI systems is, and also the distribution of uniqueness over many answers. I haven't dived into it yet, but I find it surprising how little this comes up when these systems are discussed (ie here on HN).

Does anyone even know? Can we even check? What if 1 in a thousand, or one in a million outputs is (very close to) something existing? I find this especially relevant when generating faces.

noisy_boy4y ago· 2 in thread

Say, I want to write a getter method like below:

    String getName() {
        return name;
    }

Let us also assume that this snippet, unsurprisingly, has been in several copyrighted repos that didn't grant Github the right to share this code.

So I start tying "getName" and copilot suggests the exact snippet above. If I use this snippet, is it plagiarism? Even though the above code is the most "obvious" way to write this getter and I would have written it this way even without copilot's suggestion? Or does the "uniqueness" or "non-trivial quantity" of the suggestions have any bearing in determining copyright violation? How/where do we draw the line?

warkdarrior4y ago

Clearly your code could be improved with some `Factory` objects and some dependency injection!

glouwbug4y ago

Lucky for you if you, if you wrote a noise function that copilot returned as an implementation of Perlin noise you'd be breaching a _patent_! Said patent just expired a 20 year run, so you'll be okay this time!

jarenmf4y ago· 2 in thread

I guess the question is where you draw the line between a derivative work and "learnt by an AI algorithm"

asimpletune4y ago

Who needs a line when there are plenty of obvious examples lifted verbatim?

triknomeister4y ago

If the media copyright industries and their ContentID is anything to go by, it doesn't matter. It's all derivative.

dgb234y ago· 1 in thread

Is it smart enough to:

- respect attribution

- respect copyleft

- respect proprietary licences

- give the user appropriate hints about the above

Or does it just copy code without doing any of this?

spupe4y ago

No, it doesn't do any of that. However, it does not "copy code" except in marginal use cases, the far more common scenario is that it will suggest you very basic code that is akin to a Stack Overflow reply.

1 more reply

tremon4y ago· 1 in thread

I might start considering Copilot if Microsoft were to train it on their own internal codebases (Windows, Office, SQL Server). Until they do, it's clearly a "tool for thee but not for me" type of situation.

clircle4y ago

"tool for thee but not for me" <- what does this even mean?

k__4y ago· 1 in thread

Isn't that what Web2 is all about?

Someone creates content for free, and companies monetize it.

WesolyKubeczek4y ago

The real Web3 is companies sue original creator for infringement.

bmacho4y ago· 1 in thread

On a side note, I do believe that short programs or functions should be copyright free by law.

Or we as a community need to create a better bsd, a cc0 for everything.

Almost everything is nontrivial, and almost everything is copyrighted, at least with the pressure to name the original author (BSD, GPL, other major permissive licenses).

Say you want to use a library, then you check for examples in the documentation, now you have to denote somewhere that the example is from the documentation (best if you put it in the source code, so you don't lure other people to copy what you copied and refer you as the author).

It is a major PITA at least for me.

stagas4y ago

What about a law that makes all code available but then requires you to use a portion of your earnings to compensate the people their dependencies you used?

tpoacher4y ago· 1 in thread

Does this mean I can steal stuff if I say I trained an AI to do it for me?

bmacho4y ago

Is cat an AI?

1 more reply

rosmax_13374y ago· 1 in thread

I think this problem has no good solution until IP laws around the world are properly reimagined from the ground up. I'm of the quite radical stance that code, music, art in terms of their intellectual existence should be free for anyone to take. (you can own a harddrive with code on it, and claim noone should steal it, but not the idea of the code itself)

If you have ideas, code, music or art which you wish for noone to partake in, do your best to keep them secret. Certainly, breaking into secret areas should be illegal, but once the cat gets out of that bag it gets out of the bag.

The creative people behind these ideas I believe will be able to find good compensation nonetheless in society, IP-laws nowadays only serve to protect megacorporations to the detriment of creativity and ideas.

zzo38computer4y ago

I agree. This will fix it. I think that copyright and patent should be abolished, but that if it is secret then it is still secret (unless someone else manages to come up with the same thing (e.g. by decompiling a published computer program to reconstruct the source code), which case it can be public). And so then also the AI can copy the code too just as much as you may do so manually; if it is published then you can do it and it should not be illegal to write such things.

janosdebugs4y ago· 1 in thread

It'd be nice to see some proof here. Copyright is not absolute and does not extend, for example, to things that have no creativity in them. There are only so many ways to write a for loop or an if condition. Training an ML model from a large body of code IMHO violates copyright no more than any of us reading code and learning from it, as long as GH Copilot doesn't spit out code that's exactly the same as something already existing.

namose4y ago

https://twitter.com/mitsuhiko/status/1410886329924194309?s=2...

pen2l4y ago

Bit of a stretch to fashion AI-derived/AI-coauthored works as other people's work. Are DALL-E portraits done Picasso-style unrightfully selling Picasso's works? Is an individual selling portraits done Picasso-style unrightfully selling Picasso's works?

No, of course not. Joyce's literature was influenced by Ibsen, Mozart looked up to Haydn, Newton was humble enough that he openly professed he stood on the shoulders of his predecessors, Perelman refused the Millennium prize because it wasn't also offered to his colleague Hamilton.

All human innovation is iterative, and derivative. https://www.youtube.com/watch?v=jcvd5JZkUXY

Our skill doesn't grow in vacuums, without outside mentorship and guidance. There are areas where I am upset about the application of AI, but this is not one of them. Consider copilot a gentle guiding hand for those without access to a second pair of eyes nearby to give you reminders on what you may otherwise have on the tip of your tongue.

But in the way that Led Zeppelin refused to recognize how heavily their music was influenced by delta blues artist was unbecoming, I can accept the argument that it is perhaps douchey of Github to sit on Copilot as squarely their creation.

albertzeyer4y ago

So, how often does it actually happen? Does it happen more often than for a human? Does anyone actually have numbers on this?

Of course, if you provide already a copyrighted prefix, and it has seen that code, the chances are high that it would complete the copyrighted code (because that is what you actually would also expect).

So, for real use cases in the wild, where you write some own real novel code, how often would it suggest some copyrighted code? And how often would a human?

I have used Copilot the last months and I have never ever seen such a case (I can be pretty sure because all the identifier names are really unique, and the code was very custom).

However, I assume that I myself might have produced copyrighted code unknowingly because if you write common patterns (e.g. some tree or graph search, or some sort function, implement LSTM or Transformer, whatever), the chances are not so low.

Ciantic4y ago

I'm bit mixed on this, code Copilot usually autocompletes me is not particularly novel, it's just mundane stuff I would write anyway. Most of these snippets are not copyrightable in my opinion, because it was obvious in the first place. Like CSS nth-child odd / even logic, or one case it filled me ~10 lines JS logic of filtering rows by category stored in dataset, which I would have written anyway.

Then there are cases where it amazes me completely, it wrote 10 lines of C++ code for rendering a monochrome glyphs with bits using Freetype library. It though had odd subtle bug, the glyphs came reversed and it worked with only certain font size which it seemed to pick up from different file all together.

captainbland4y ago

If we're all standing on the shoulders of giants (specifically code that other people wrote) then really what Copilot is selling is a ladder to get onto those shoulders faster. I think that's a legitimate aim, as such. However it should be careful about not including unlicensed code and should have a specific 'GPL' option for a model trained with GPL code included.

I suppose it should also generate appropriate copyright notices to satisfy many open licenses. I'd be surprised if copilot could actually link back to the original code like that, though.

habibur4y ago

We stand on the shoulders of giants. That had been the way for decades. A newer stack over the older one without much thought. And someone in the future will build even a newer stack over the current ones.

1 more reply

GuB-424y ago

> Copilot just sells code other people wrote

So what? Selling code other people wrote is the foundation of the free software movement. It is the entire business model of countless companies, and it is a good thing. Among them are most major linux distro vendors like Red Hat and Canonical.

The value added by Copilot is that they sell you the lines "code other people wrote" you want out of billions.

I still think it is derivative work, and that they should only process code under permissive licenses, or, if they want to include GPL code, make a GPL-only version, usable only for GPL projects. I thought it is what they did, there is so much code under permissive licenses that is should be enough to train their model, but apparently, they don't care, as long as it is public, it is included. For me, they are shooting themselves in the foot, several companies have already banned Copilot due to the potential issues with copyright.

floor_4y ago

I started self hosting when Microsoft bought github and with this mass theft of copyrighted material and then reselling it for money I'm even more happy with my decision.

rictic4y ago

Copilot very rarely copies code verbatum, and when it does it's very short snippets. When Oracle sued Google over allegedly copying short and fairly trivial snippets of code they were justly derided.

I can't speak to the legal side, but I just don't understand the moral outrage over very occasionally copying such short snippets of code. The key innovations and the actual value that licenses are intended to protect aren't in these short snippets.

And what does copilot bring to the community? Free use by students, free use by open source maintainers, and a huge boost in productivity for a modest fee for professional devs, for a service that no doubt costs a lot to run, even on the margin.

nathias4y ago

Copilot is a new way for corporations to break copyright while enforcing it for everyone else, this will be the first big use for AI when other corpos follow.

Havoc4y ago

Yes, though in a way so does stackoverflow & friends. Large chunk of dev ecosystem is copy paste and I don't think this is inherently problematic. It is always a case of standing on the shoulders of giants.

Its more of a licensing issue to me. As far as I can tell it was train on a blend of licenses which to me makes it inherently non-compliant. At least some of it is going to be copyleft and find its way into closed source.

0x_rs4y ago

I'm not a lawyer, nor very well versed in the vast world of licenses and their definitions in court contexts, but I've been wondering about something with the growing appeal ML-generated content has for the average person (and the "high" barrier for entry in the market) — are licenses in some form or another going to adapt to this phenomenon? From a brief search, I have not found any new license with a no-dataset-usage clause (assuming fair use does not apply, that's another big question). What are the chances anything of the sort will become an option for any "creative" work that's usually shared freely (such as artwork, code, et cetera) even despite copyright? What about the ownership of the dataset? It seemed to be questionable years ago already that possibly IP-protected content goes through the black box and resembling material gets on the other side, whose ownership is it really? I'm guessing some notable court cases in the future could define this in the following years if the popularity continues growing.

thewoolleyman4y ago

Artificial Intelligence is causing us to revisit the difference between free as in beer and free as in speech (https://en.wikipedia.org/wiki/Gratis_versus_libre).

It is putting a new spin on some traditional Open Source Lessons (https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#L...).

People share and reuse snippets of unattributed snippets of MIT-licensed and GPL-licensed code on the internet all the time, StackOverflow, etc.

StackOverflow is profiting from that activity indirectly by facilitating it. They profit passively through ad revenue, and actively through the Teams subscription offering.

But nobody seem too upset about that.

How is an AI which facilitates the same code sharing fundamentally any different? Because it’s scraping it itself, rather than humans contributing it?

Seems like a tenuous argument at best.

mullikine4y ago

Traditional 'real' (as opposed to 'imaginary') programming is like writing in assembly code; It's outmoded because of generative models, in a way similar to 'C' outmoding assembly code. The most important thing, I think, is that free (libre) software developers are able to work with the language models directly, so that libre software is allowed to continue progressing into what I call Imaginary Programming. That's because with a generative internet all you really need is blockchain + prompting.

https://huggingface.co/spaces/mullikine/ilambda

Language models are able to 'steal' the linguistic meaning-making 'essence' of the software, by modelling:

- How the software is used (mimicing its function) - external meaning

- How functions are 'inspired' - internal meaning (reflection)

https://github.com/semiosis/imaginary-programming-thesis

The models themselves should be clear about where the data came from. However, this is only possible in a fair world which we do not live in. Compromise must be made to protect national interests.

Generative models are license blind and there's very little that could be done to prevent progress. Like what the invention of the camera has done for art.

Large language models including Codex are a transformative technology.

Bi-directional fair-use is probably the best result we can hope for.

So long as Microsoft and OpenAI are not selling back usage of the model to the open-source community, I think it's OK, though it's the bare minimum obligation.

iptq4y ago

I know this isn't really related to the whole copying ethics debate, but I definitely feel like there's some sort of foul play happening here. For all of the unlicensed projects out there, the license that is automatically granted to Github includes:

> the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time

It's insane how vague this is. Is Copilot a "Service"? Sure, by its definition:

> The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.

And since much of the code was published before Copilot's inception, this means Github can just arbitrarily add more "services" and milk the code for whatever it wants. Automatically service-ify any public repository? Sure, pay us for quotas. It's like a legal loophole to let Github just bypass any license restrictions you put on it.

aetherspawn4y ago

Copilot is a fancy pattern bot.

Humans make original patterns, but since Copilot cannot think, then Copilot does not. It squashes together a bunch of small individual patterns, each under their own license, but at no stage does it do anything more than pick a line from here, and a line from there.

It doesn’t think, and it doesn’t create new IP.

It is like making a picture out of small snippets of a thousand other pictures, and then selling it.. clearly not OK. You still ripped off the original artists.

Or like plagiarising 100 of your class mates’ assignments. Are you less guilty because you went to the effort to steal just a few sentences from each?

A criminal who steals a cent from every account at the bank is a more sophisticated thief than someone who holds up a petrol servo.

If Copilot doesn’t create new IP (it doesn’t; we established this), then it uses existing IP. And in that case it is no different to any of the three analogies above.

maxbaines4y ago

Initially not thought about co-pilot and other ai generators this way, but now I have I’m finding it hard to ignore.

madrox4y ago

I don't think any professional community is aligned on how to think about ML-generated content yet. We don't know how to apportion rights between the data owner, the model owner, and the end user, and I don't think existing copyright law is ready for it. At least for software, I think the way forward is for the next generation of software licenses to explicitly state whether the code can be used to train ML models and what those models can be used for. Without explicit language, we'll be squabbling over interpretations of fair use.

There's going to be some big cases here. It's going to end up in the Supreme Court sooner or later, and if it were to go there today I think I know what they'd say.

tsujp4y ago

Copilot produces verbatim GPL'd code. It's also a closed box.

Source: https://twitter.com/mitsuhiko/status/1410886329924194309

ewalk1534y ago

If the portion of code that Copilot lifts is the "heart" of the original work, that would be much less likely to be considered fair use[1], regardless of the length.

> For example, it would probably not be a fair use to copy the opening guitar riff and the words “I can’t get no satisfaction” from the song “Satisfaction.”

I wonder how this could be integrated into the system?

[1] https://fairuse.stanford.edu/overview/fair-use/four-factors/...

pornel4y ago

Tough pill to swallow. Microsoft's actions don't seem fair, but fighting them with copyright could weaken fair use:

https://felixreda.eu/2021/07/github-copilot-is-not-infringin...

There's a good argument that demanding copyright protections on scraped datasets and short snippets is a double-edged sword. It could harm search engines, distribution of news, and non-commercial ML research too.

stakkur4y ago

At every turn, in every instance, for decades, all stories involving Microsoft end in "...and then Microsoft fucked people over." I've witnessed this firsthand since the 80s.

williamcotton4y ago

Should the snippets that Copilot is regurgitating be considered for copyright in the first place?

It seems akin to trying to copyright a certain drum pattern or chord progression.

Also, the history of the GPL, MIT, commercializing lisp machines, Symbolic, infighting, etc… seems a very different context than Copilot so I am having difficulty seeing the systemic problems that tools like this encourage.

There is of course a surface level similarity in that a corporation is profiting from IP in the public domain but the devil is in the details.

sirsinsalot4y ago

Jaron Lanier's book "Who Owns the Future?" Is all about AI and compensating those that input in training these very valuable models.

I highly recommend everyone read it.

BiteCode_dev4y ago

It is incredible to use though. I pasted the return value of an API call in comment, then started to write a schema class. Codepilot just created the entire class for me. wanted to extract a subset of the data, I typed get_<_name_of_the_subset>(), it wrote the code I would have written.

So even without using someone else code, just the pattern understanding and the production of simple boiler plate code is great.

powerapple4y ago

Why is it a bad thing? You either have people spending time reading code and learn every little thing and produce the same work in days, or have Copilot saves human life time for hours. Coding would be more efficient, it is a win-win for everyone in this industry, right? I know people attach to the code they write, but we all learn from books, and the result is common enough.

1 more reply

Aeolun4y ago

> what github / microsoft is counting on here is that open source developers do not have enough collective power to do anything to stop this

I think it much more likely that they count on everyone liking it way too much to give a shit about their MIT code not being attributed correctly.

I certainly don’t. MIT just seems like the most convenient license for people that need licenses (corporations?), so that is what I use.

vbezhenar4y ago

I somewhat agree with that. Yesterday I edited some exotic configuration (Kubernetes CSI driver for Cinder) and Copilot suggested me config which looked like someone's config. There were no values, so it was good at filtering them out, but it definitely looked like cleaned part of code which resides in some project.

I don't think that's bad though. Code sharing is good for overall productivity.

c01n4y ago

MS and Github are thieves, all their code is closed source, yet they sell copyrighted code they don't own. If they told us years ago that our code will be automatically stolen by an "AI", most coders would not have created an account. The innovation here is that they have access to most of the worlds open source code and automated the stealing.

capableweb4y ago

If GitHub could guarantee that the code Copilot had ingested was only made with OSS licenses, then I don't see what the problem is.

But as far as I understand, GitHub trained Copilot on any public repository on GitHub, meaning even if it doesn't have a license specified (so the user publishing it still has the copyright to it), then I don't see how it can be OK.

5 more replies

eline434y ago

There needs to be an update to either licenses or GitHub (and other) software directly, or even software terms of services, that gives the user an opportunity to opt-out of their data being used to train proprietary AI models.

'I don't agree with having an AI trained on/with my data.'

IMHO, all other problems with copilot stem from this.

zokier4y ago

Sure, the concern is valid but I feel like this tweet adds absolutely no substance to the discussion and just repeats the same opinion that was already rehashed to death since copilot originally launched. As such, especially with the tone that the tweet has, I don't expect constructive discussion to raise here.

dgb234y ago

Reading many of the comments here I feel like one important thing is being left out that is not related to legal, but to social issues:

Who is on the side of open source? Where are the big, powerful institutions and companies that deeply care about authors and communities providing free software that so many of us rely on?

andrelaszlo4y ago

There are a few reasons why this could be considered ethical. First, open-source code is typically free to use, so the company would not be taking advantage of anyone by using it to train their AI. Second, the company would be providing a service that people are willing to pay for, so they would be generating value for society. Third, the company would be transparent about what they are doing and would not be hiding anything from the public.

...the above was generated by GPT-3 (text-davinci-002). Prompt: Write an argument for why using open-source code to train an AI and then sell the code generating service (without open-sourcing it) is ethical.

The main argument against this is that it takes away from the open-source community that contributed to the development of the code in the first place. By selling a code-generating service without open-sourcing it, the company is profiting from the work of others without contributing back. This is unfair and takes away from the overall open-source ecosystem.

Added two characters to the prompt :P

pabs34y ago

I wonder if FOSS folks could copyleft originally public/leaked but proprietary code using CoPilot.

nl4y ago

This isn't how a language model works.

It's SO frustrating that even on HN people still fall for this naive and incorrect analysis. Pasting bits I've said before on this topic:

Language models do not work like this. They can copy content but usually that's for something like the GPL language text.

Generally they work on a character by character basis predicting what is the most likely character to appear next.

This very rarely results in copying text, and almost never rare text.

Mechanically it has learnt both syntax of language and how concepts relate. So when it starts generating it makes sentence that are syntactically valid but also make sense in terms of concepts.

That's really different to just combining bits of sentences, and it gives rise to abilities you wouldn't expect in something just cutting and pasting bits of sentences. For example, few shot learning is mostly driven by its conceptual understanding and can't be done by something with no way to relate concepts.

2 more replies

olalonde4y ago

I'm going to make a bold prediction: no one will ever lose a copyright lawsuit due to usage of Github Copilot generated code. The code snippets it produces are too small or trivial to qualify for copyright infringement.

1 more reply

stefanos824y ago

Seems like my original questions [1] are more relevant than ever!

[1] https://news.ycombinator.com/item?id=27677598

tiborsaas4y ago

MrDoob has an excellent point about this:

https://twitter.com/mrdoob/status/1539740854956412929

lfrigodesouza4y ago

It's as the saying go, "when a product is free to use, the real product is actually you". In this case, our code is the product. Just considering now on swapping to another git provider...

oytis4y ago

Copilot sells the service of finding the code that makes sense for what you write. Would be better if it could correctly attribute the source(s) though, I hope they will solve this problem at some point.

thih94y ago

Is github copilot using private repositories for the learning process?

If yes, how do they mitigate the risk of exposing private data when something is quoted verbatim?

If not, then why are repos with non permissive licenses ok?

sirsinsalot4y ago

Beware geeks with gifts. This is Microsoft. The question isn't "is it good?" but "Why are Microsoft offering it and how is it undermining everyone else?"

1 more reply

mawadev4y ago

What stops me from re-uploading copyrighted source, where I remove the notices and push it with an MIT license? If such a data set has been trained with, how do you get it out?

LeonTheremin4y ago

And social media sells ideas other people thought.

Copilot is limited to public code now, but it may easily be trained on non-public code - albeit this probably won't be for sale to the public.

FeepingCreature4y ago

All I can think of is Steve Yegge [1]: "They have no right to do this. Open source does not mean the source is somehow 'open'."

My code is on Github so that people can read it, reuse it and learn from it. "The freedom to study how the program works", as the FSF says. If some of the people reading it are machines, why would that matter?

[1] http://steve-yegge.blogspot.com/2010/07/wikileaks-to-leak-50...

1 more reply

iLoveOncall4y ago

Github Copilot is selling code other people wrote as much as the author of this thread is profiting from words other people invented.

Absolute nonsense.

1 more reply

presentation4y ago

Google just sells content other people wrote.

AtNightWeCode4y ago

Copiliot will be that bandmate that plays a new riff and leave you wondering about where it was borrowed from.

acuozzo4y ago

This is, in part, why I will continue to use the original 4-clause BSD license for the code I write.

blitz_skull4y ago

Man, people really do be angry that the public code they put on a public platform is being used publicly.

Wild.

1 more reply

boomer_joe4y ago

We need a licence that forbids use in ML and the people willing to sue github for it ASAP.

1 more reply

shahar2k4y ago

and Dalle2 sells art other people created

(I'm actually not being sarcastic, I think there needs to be some sort of pipeline for compensating the artists who are used to train these models

fimdomeio4y ago

what AI is showing is the fuzzy line between creating and copying. The truth is they are both always present in everything we do, we've just been trying to hide it.

So it should be as simple as if you're using other people's content for your own profit you should properly compensate them.

Or we could just abolish copyright law and assume that everything humans create emanates from culture so its always collectively built and everything should be open source.

Or we just do the same we've been doing. Create even more complex laws trying to define this fuzzy line in a way that companies can keep profiting from it a lot more than individuals.

marstall4y ago

most of the code I write is glue sticking together 8 proprietary systems nobody's ever heard of. how is copilot gonna help me with that?

tiku4y ago

I'm using it for a day now and i'm really impressed. It is so aware of stuff in old code, that it is scary. I'm working in an old application with Zend Framework.

whywhywhywhy4y ago

Same deal for Dall-e if they ever productize it.

pvaldes4y ago

Each day sounding more as Zopilote, it seems.

sytelus4y ago

Google just sells content other people wrote.

SMAAART4y ago

Once again Innovation challenges IP.

HeavyStorm4y ago

So much bullshit my head hurts.

lysecret4y ago

Don't we all.

honkler4y ago

license issues will save many thousand jobs.

amelius4y ago

"Good artists copy. Great artists steal."

abdulhaq4y ago

That's like saying a plumber just sells parts that other people made

2 more replies

janandonly4y ago

Isn't every programmer in history (except the gall who invents her own language and writes all her own code) simply an archeologist for other people's work?

We all Duck/Google for code anyway. Why not admit and make it easier?

2 more replies

danamit4y ago

The code Copilot suggest from any given project most of the time is not enough to credit such project, when I look up code in some GitHub repo, and copy it fully or part of it, I do not credit that project.

I do not see Copilot as useful anyway.

Separo4y ago

GitHub provides the repo hosting and tools for free on public projects. I'm happy with this deal.

1 more reply

spupe4y ago

I disagree. Copilot is selling content-aware code suggestions, which is a result of code that other people wrote in their platform, and which in no way affects the work of these people.

lakomen4y ago

I don't understand what's going on there.

I don't use github. Can someone explain what the author means?

Edit: in detail

3 more replies

skc4y ago

I get the feeling this entire debate would have been non-existent had this been a Jetbrains product instead.

The whole thing is just bizarre when the vast majority of developers constantly look at OSS code daily and lift ideas/patterns/snippets from there regularly without once looking at whatever license is attached.

3 more replies

bborud4y ago

Well, this does invite an interesting comparison. If we imagine something like Copilot applied to music I believe the chances of ending up in court would be pretty high. There are a lot of examples of plagiarism lawsuits in popular music and the outcome seems to be entirely random.

One could argue that the information density in chord progressions, bass lines and beats is extremely small. And that any recognizable part of a musical idea that has been "borrowed" would necessarily make up a larger percentage of the complete work than would be the case for a typical application with borrowed snippets.

That's not a bad argument, but it is unsatisfactory because it means that at some point someone has to make a judgement on how much you can borrow.

j / k navigate · click thread line to collapse

815 comments

242 comments · 97 top-level

HumanReadable4y ago· 35 in thread

Sorry for the unproductive tone of this comment, but there's something about the attitude of this tweet that really grinds my gears.

Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.

I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

meheleventyone4y ago

Just because you’ve made something cool doesn’t give you the right to harm others in the process.

If MS or OpenAI don’t think this is the case then they should have also included their private repositories.

9 more replies

lin834y ago

> Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.

If CoPilot is copy/pasting large amount of code with unknown licenses, that is a large and real risk for users aside from violating open source projects licenses.

2 more replies

nextaccountic4y ago

> I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

Because they shared the code under a license, and they have the right to complain if people use that code but don't follow the license.

Often following the license is as simple as giving credit to the original author.

1 more reply

highwaylights4y ago

This seems disingenous.

People don’t have a problem that AI is being used in some form to provide the service.

The complaint is pretty clearly that code is being lifted from repositories without attribution or compensation, and being redistributed into other applications.

How impressive the work behind copilot is or is not really isn’t relevant.

2 more replies

hansword4y ago

If I enter 'Mickey Mouse' into an ML-TTI thing like Craiyon (Dall E mini) do you think I will be able to sell the resulting image on a Tshirt?

No, I won't, because Disney has fancy lawyers, the average open source developer hasn't. What you are saying is: Screw little people, let M$ make their money.

Either copyright is for everyone, or for no one. I prefer the latter, but this is not the world we live in.

3 more replies

teakettle424y ago

My code is shared under a license (MIT) that mandates attribution.

That’s all I ask — if you use my code, give me credit.

Stealing my code to train your bot — which will replicate portions verbatim! — is no different whatsoever than the casual plagiarist that copies and pastes a novel snippet manually.

Its absolutely my legal and ethical prerogative to complain about people stealing my code by failing to respect the license under which it was freely provided.

2 more replies

jacquesm4y ago

They are complaining about license violations, they are not pissing on this incredible (is it?) achievement.

matthewmacleod4y ago

I also disagree with the tone of that tweet, but your dismissal is equally shallow and gear-grinding.

The comments here regarding this are honestly a total dumpster fire. It's mostly a bunch of paper-thin hot takes, either:

- The blatantly lying "I have literally never looked at any other computer software while developing any obviously anybody who has ever seen other source code is a plagarist"

It's dumb because there is an actual interesting discussion here but I guess we're not going to bother having it.

1 more reply

Sakos4y ago

2 more replies

the_gipsy4y ago

> share their code on github where it is publicly available just to complain when others make use of that knowledge

I put a fucking license on it so that it doesn't get abused by some fucking corporation. Jesus Christ, it's not hard to understand.

Mizza4y ago

Pretty fucking simple explanation for it, actually:

I don't make Free software so that Microsoft can sell it to people for use in proprietary projects.

bambax4y ago

The world would probably be a better place if there were no copyright.

But the world we actually live in is one where corporations have copyright, and individuals don't.

That's what irks people, I think rightly.

akagusu4y ago

> I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

People like you should understand that publicly available code doesn't mean "do whatever you want" code.

The majority of publicly available code hosted on Github as a license that tells you what you can and what you cannot do with that code.

If someone uses this code without respecting the license, authors have the right to complain and even legally enforce the license if they want.

Now, you should know that there's nothing "cool" to take other people's work without permission.

DoreenMichele4y ago

Chris20484y ago

> willingly share their code on github where it is publicly available just to complain when others make use of that knowledge

because it's not unconditional, there are often licence terms of usage, and copilot is potentially laundering those.

rglullis4y ago

> why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

throwoutway4y ago

I hear you, but this isn’t a “marvel at this free open clever academic thing we built”

It’s a product by a business. Why is that not open to criticism?

rockbruno4y ago

3 more replies

ThePhysicist4y ago

pmarreck4y ago

hdjjhhvvhga4y ago

> Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.

It is not true. Whenever there is something really useful, everybody is happy, and while of course they always are some nansayers, they're very few.

nixpulvis4y ago

You should read more about peoples ideologies and philosophies of Open Source.

One big reason I support it is because it grants me the right and ability to change things I need/want to change.

ricardoplouis4y ago

zitterbewegung4y ago

nerdponx4y ago

Tryk4y ago

This doesn't address the point of the Tweet, you are simply attacking the form of their argument.

Moreover it is possible to BOTH marvel at the human ingenuity that went into making copilot AND disagree with their methods. Some things can be marvelous and wrong at the same time.

isitmadeofglass4y ago

Yes but,

— This comment brought to you by HN-Comment-AI ©

1 more reply

gumby4y ago

People get paid to write code having learned from writing code for others and from reading code others wrote. In this regard I dont see why github copilot is any different.

2 more replies

lobocinza4y ago

Plagiarism isn't new or incredible.

hk13374y ago

Usually they want some recognition for their contribution and with GitHub copilot they get none of that.

sAbakumoff4y ago

It's the negativity bias beauty in action. You have it too.

pwdisswordfish94y ago

rambojazz4y ago

Sounds like they're not selling any of your code

1 more reply

B1FF_PSUVM4y ago

> negative nancies

Not bad for everyday use - I like "nattering nabobs of negativism" (as scripted by William Safire), but it is really a bit over the top.

OrwellianTimes4y ago

Fully agreed. It's just people getting mad and jealous but hear me out.

Copilot is NOT SELLING coed other people wrote, it is simply acting as a curator to show you all the solutions people HAVE WRITTEN for free.

Copilot does NOT write entire programs, it's simply an assistant. And there is not much copyright you CAN apply to 3-4 lines of generally understandable code.

Why are people getting so salty?

1 more reply

Guid_NewGuid4y ago· 17 in thread

I find this whole topic very annoying, this is like the 3rd variation to reach the front page today. But it has made me realize why I instinctively dislike Free Software as a movement.

Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

Free the code.

sirsinsalot4y ago

"A commons of knowledge is a public good."

Yes but this copilot model takes that, adds value and doesn't itself join the public common good. Instead it takes it, and makes you pay to have it back in another form.

If copilot were open source and the model released for the public good, being built of public data (in your scenario) we would have a very different conversation.

7 more replies

monocasa4y ago

4 more replies

marpstar4y ago

> Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps

2 more replies

georgeecollins4y ago

1 more reply

bayindirh4y ago

> I find this whole topic very annoying, this is like the 3rd variation to reach the front page today.

Me too. I also find three iterations of the same subject not enough discourse. We need to take this matter more seriously.

> But it has made me realize why I instinctively dislike Free Software as a movement.

On the other hand, this whole discourse reminds me why I absolutely love Free Software as a movement.

> Copyright and licensing are bad, actually.

This is why we have "Copyleft".

> Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.

And, stop getting into frenzy of arousal about being able to use any and every code piece you see elsewhere in any project regardless of its license.

> Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

> Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

> Free the code.

Yes, and respect the license the author selected for his/her code.

1 more reply

mplanchard4y ago

1 more reply

notacoward4y ago

vajow462674y ago

So glad this sentiment is becoming more common in the OSS community! I MIT license everything, if someone wants to make money using stuff I wrote that's awesome, and I wish them the best.

I don't think users owe me anything at all. If people want to PR back that's cool but if not that's cool too.

eikenberry4y ago

There is a license for that, the MIT-0 or the MIT No Attribution License.

https://opensource.org/licenses/MIT-0

wcoenen4y ago

> I want a truly public domain license

I think this sentence contradicts itself.

A "license" implies that there is a copyright holder who allows usage of the work under the terms of said license.

While "Public domain" implies that there is no copyright holder (e.g. because the copyright expired, was explicitly waived, or is for some other reason not applicable).

If you want to put your work in the public domain, you can do so; simply include a note saying that you dedicate it to the public domain.

1 more reply

nonbirithm4y ago

By that point there would be no way to argue "but they shouldn't care, they licensed it like this, so I'm assuming it's fine for ML use."

If copyright is not enough to stop another entity from using a person's data for training, then some other protection should be invented that does.

1 more reply

Schroedingersat4y ago

The problem with this is 'freeing the code' in this instance leads to microsoft building a wall around it and asserting complete control in a few years.

Copyleft exists for a reason and without the ongoing fight for the commons we lose it all.

nmfisher4y ago

That seems totally inconsistent with decades of people clamouring for more openness/liberty when it comes to IP rights.

1 more reply

progman324y ago

I see the free software movement as a variant on your ideals but rooted in practicality given the current environment.

1 more reply

kube-system4y ago

> Free Software

> public domain

futureshock4y ago

4 more replies

ssalka4y ago

Information wants to be free

nickjj4y ago· 9 in thread

This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

kemiller4y ago

2 more replies

PaulKeeble4y ago

2 more replies

ellyagg4y ago

Well, I hope your viewpoint doesn't win the day, because making code as freely shareable and remixable as possible is a huge boon for humanity.

5 more replies

throwaheyy4y ago

The Twitter thread’s title seems unnecessarily incendiary and clickbaity.

I don’t buy that producing/synthesizing code snippets based off public repos is a problem.

There’s nothing proprietary or original about eg. the syntax of a for-loop, or the boilerplate of setting up some JS framework MVC.

1 more reply

lbhdc4y ago

I stopped publishing open source after all this started coming out because I was so uncomfortable with it.

jaywalk4y ago

If your code is using a license that allows it, how could you possibly opt-out aside from using a different license?

5 more replies

ghostbrainalpha4y ago

It would be kind of cool if Github could show some stat that code you wrote has been used 50,000 times for 12,000 people.

Being a top CoPilot contributor should at least have value to signal on your resume.

dragonwriter4y ago

> This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

invig4y ago

They’re not selling you code. They’re selling you an engine that helps you find the right free code at the right time.

If you read free code yourself it’s fine, but if a machine does it for you it’s not? We overvalue humans.

1 more reply

VoodooJuJu4y ago· 9 in thread

It is now proven that copilot returns code from codebases with non-permissive licenses [1].

I'm curious - what are the legal implications of this going forward? I've so many questions.

1. Will Microsoft ever face lawsuits for these license violations?

2. If so, who/how? Class-action?

3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.

4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?

[1] https://news.ycombinator.com/item?id=27710287 | Copilot regurgitating Quake code

mhaymo4y ago

That regurgitated code exists on Github exists under an MIT license: https://github.com/jethrodaniel/fast_inv_sqrt

"jethrodaniel" does not appear to have the copyright to offer that license, but it's hard for Github to determine that in general, so I doubt they would be liable for the error.

4 more replies

5424584y ago

1 more reply

concordDance4y ago

There's also one more question:

4 more replies

pwdisswordfish94y ago

Is there any leaked Microsoft code on GitHub? Someone should check if Copilot regurgitates that as well, then see how Microsoft reacts when someone slaps an AGPL license on that…

1 more reply

rifty4y ago

My question is, if it isn't a copyright infringement issue to use copilot in its current form right now, why not just claim copilot was used whenever accused of copyright infringement hence forth?

2 more replies

Beltalowda4y ago

> It is now proven that copilot returns code from codebases with non-permissive licenses [1].

That same Quake example from last year is repeated every single time.

blihp4y ago

1) Most likely

2) TBD

3) Not likely. Worst case a judgement will go against them, they'll effectively pay a fine and then they'll retrain it on a more restricted set of source code.

4) OSS has a pretty tragic history re: enforcement. It wins nearly every skirmish but has no interest in the war so from a big picture standpoint, it loses due to apathy.

bastardoperator4y ago

You don't think a mountain of MSFT lawyers in every state, including partner law firms around the world haven't thought about this? Do you practice law or are you speculating based on emotions?

1 more reply

throwaway232344y ago

Big meh. That quake code was MIT.

1 more reply

parhamn4y ago· 9 in thread

Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art.

No book, painting, codebase, sonnet, design is theft-less.

The art is the space reduction, otherwise we’d just bruteforce away.

mihaic4y ago

This type of argument always distracts from the fact that figuring out where we draw the line between theft and reimagining.

The hard discussion is about what is acceptable, we all know prior art exists.

3 more replies

izacus4y ago

> Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art

pera4y ago

wnkrshm4y ago

So the only thing left is handiwork I guess. Engineering isn't different from art in any way, the constraints are just stricter.

stemlord3y ago

>Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art.

That applies to everything, its even a basic law of physics, and there's absolutely nothing wrong with it. Any layperson already knows what a remix is anyway so not sure what you think will change

natly4y ago

3 more replies

Chris20484y ago

Is it really "just" that? Is there no original creativity in the choices (and skill) in the blending, and choosing what (and how) to blend?

Would you describe a parody, or a critique/review, as equally without original merit?

lioeters4y ago

Agamus4y ago

This idea has been around for a while - why... "pretty soon"?

And I'm sure I couldn't disagree with you more. Or are 'influence' and 'theft' the same now?

2 more replies

mojuba4y ago· 9 in thread

Can I suggest a hypothesis that if you find Copilot useful it means the problem you are solving is a boring one? I might be wrong of course.

alpaca1284y ago

I disagree. Most large projects, software or otherwise, use existing parts. If you design an innovative device you'll still use some standard components like chips, memory modules etc.

alkonaut4y ago

99% of the "problems" I'm solving when I'm working even on very interesting and challenging problems, are boring subproblems. If I can get those out of the way then that would be great.

viraptor4y ago

mistercow4y ago

That hypothesis is easily disproven by spending an afternoon on a side project with Copilot.

1 more reply

triknomeister4y ago

99% of work in 100% of interesting projects is boring.

para_parolu4y ago

1 more reply

workingon4y ago

Seems like a narrow vision. Is every line of code you write to solve a problem “not boring”? I solve problems I find interesting, but writing matplotlib code to visualize data never is.

trention4y ago

muzani4y ago

Yeah, it's for boring problems. Drawing a circle or detecting a specific format of number in some string, for example.

spupe4y ago· 8 in thread

XCabbage4y ago

Way too often I burn half an hour needlessly during review in one of two ways:

1 more reply

genezeta4y ago

About 10 years ago or so, I was working at a certain place. They put me into a small team apparently focused on some R+D project under the direction of an "architect".

Would I call this "selling other people's code"? Yes, I would.

1 more reply

whatatita4y ago

If the solution was made up of ideas from OSS and snippets from Stack Overflow? No; that's fine.

Many copyleft OSS licenses require attribution and distribution of derivative works that we wouldn't allow.

mbreese4y ago

I’d also expect for any stack overflow code to include a comment with a link to the stack overflow page.

I think one of the key points is to make sure any code taken from another source is cited appropriately. If it isn’t, or the junior dev is passing it off as their own work, then we have problems.

ben-schaaf4y ago

jhugo4y ago

thelastbender124y ago

trention4y ago

It's absolutely fine to allow humans to do that while prohibiting (commercialized) AI to do the same thing.

1 more reply

antihero4y ago· 6 in thread

I mean, if it's autocompleting a fairly simple line, and can do that because it's analysed a lot of lines, I don't really see that as "stealing anything".

If you are using it to write whole complex functions thatare the same as other people's, I guess that is copying.

But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

I think the first use case is far more common, and creating boilerplate that is so generic you could never really attribute it anyway.

rob744y ago

> But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

If you do that on your own, it's your (legal) responsibility. If Copilot does it for you, it's GitHub's/Microsoft's responsibility.

4 more replies

afiori4y ago

The when Oracle won its copyright lawsuit against google it was because of a 8 line bound checking utility function.

3 more replies

alpaca1284y ago

The first can be automated without ML though. And once you use ML you cannot guarantee it won't copy-paste existing code.

This whole thing would be fine if GitHub hadn't just used all public code on their platform, ignoring all involved licenses.

2 more replies

dobin4y ago

It's like saying GPT-3 created text is copyright infringement, because some author used the same sentence in a book before.

4 more replies

carom4y ago

wodenokoto4y ago

> If you are using it to write whole complex functions thatare the same as other people's, I guess that is copying.

> But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

How would I know that the boiler plate I ask copilot to write for me is copied verbertim from a codebase, that neither I nor Microsoft has licensed to use?

borishn4y ago· 6 in thread

Copilot is fair use, get over it!

Copilot is not writing your code any more that Google search is writing your code. You are writing your code, and Copilot is just making suggestions.

US constitution secures limited copyright to "To promote the progress of science and useful arts". Copilot is just that, get over it!

Buttons8404y ago

A good and well argued opinion made hostile by saying "get over it" twice! Saying "get over it" discourages further discussion. Your comment would be better without it.

2 more replies

nescioquid4y ago

1 more reply

brianmcc4y ago

Wait till it suggests something Disney can argue they own rights to...

2 more replies

zerocrates4y ago

Yes, the copyright clause gives as its purpose "the progress of Science," but that doesn't mean that anything which claims to be "progress" gets a free pass.

1 more reply

jazzyjackson4y ago

Personally I think I'll just claim all the code I write with co-pilot is a parody.

humanwhosits4y ago

Citation needed for copilot being fair-use

ThereIsNoWorry4y ago· 6 in thread

1. You most likely agreed to that by using GitHub.

2. Copy&Pasting Code by manual search exists.

3. This is just a smart tool so you don't have to figure out yourself what to copy&paste (in the best case) and save a lot of time.

Sometimes I truly wonder how people can genuinely be upset about things like this. What is broken are copyright and patent laws in the 21st century.

keraf4y ago

teakettle424y ago

> Sometimes I truly wonder how people can genuinely be upset about things like this.

Tell me you regularly plagiarize without telling me you regularly plagiarize.

1 more reply

zufallsheld4y ago

1 more reply

SahAssar4y ago

> 1. You most likely agreed to that by using GitHub.

This would make any project not originally started on github and with a few contributors basically impossible to host there.

> 2. Copy&Pasting Code by manual search exists.

IdiocyInAction4y ago

I don't think that something like CoPilot is what most GH users had in mind when they published their code. Also, licenses exist (which CP demonstrably doesn't give a shit about).

dmix4y ago

> Sometimes I truly wonder how people can genuinely be upset about things like this

90% of Twitter is just inventing new ways to whine about things

1 more reply

coldtea4y ago· 4 in thread

Well, that's already the case with Stack Overflow copypasta enterprise code. If anything, use of Copilot would be an improvement...

tagyro4y ago

Do people really copy/paste from StackOverflow?

7 more replies

Hamuko4y ago

If you post content on Stack Overflow, your contribution is distributed using the CC BY-SA 4.0 license.

1 more reply

t0suj44y ago

That quote applies to any creative work. Be it code, audio or video.

1 more reply

moffkalast4y ago

> If anything, use of Copilot would be an improvement

What do you mean, Copilot regularly pastes stuff directly from SO. One of those automatic doc generators was able to point me to the exact answer where one of them was from.

1 more reply

bborud4y ago· 3 in thread

You could think of the evolution of practical problem solving in software engineering like this:

From 1 to 4 the amount of cleverness required to solve small problems drops a bit, but your productivity and exposure to knowledge probably goes up.

Yenrabbit4y ago

2 more replies

ok1234564y ago

kraftman4y ago

1 more reply

wolframhempel4y ago· 3 in thread

dmortin4y ago

Did the scan find the process if they changed the variable names, for example? Or is that considered a differing snippet then?

1 more reply

redox994y ago

Isn't all stack overflow content creative commons?

https://stackoverflow.com/help/licensing

1 more reply

dmix4y ago

That sounds like legal paranoia or a make-work program.

yaseer4y ago· 3 in thread

Technically, programmers search, copy and modify code all the time.

One might argue copilot puts into software an algorithm that humans are already doing. Software like that is usually inevitable.

Still, it sucks there's no benefit for the contributors.

omnicognate4y ago

> One might argue copilot puts into software an algorithm that humans are already doing.

That argument only works if you think what Copilot is doing is meaningfully similar to what humans are doing. The debate about how these models relate to human thought might have legal implications.

So should Copilot be treated like the rot13 algorithm or like me understanding concepts and generating new works using them? That sounds like a fascinating legal debate to be had.

teakettle424y ago

> Technically, programmers search, copy and modify code all the time.

When following the license terms, preserving the original copyright, etc, sure.

However, honest, ethical people (including programmers) do not plagiarize.

Copying and pasting code without attribution is plagiarism. Doing it without following the licensing terms is a copyright violation.

1 more reply

kaibee4y ago

Based on my understanding of how NNs work, I'm not sure its even possible to implement something like that.

seydor4y ago· 3 in thread

Programmers are fine when their creations, pretty much all of tech, resells content that other people wrote for free, but no, not code, that one must be expensive

onpensionsterm4y ago

The only one making money here is github. Very few programmers are selling open source code. And programmers are (in)famous for not buying software.

anonymoushn4y ago

zx80804y ago

%s/programmers/tech capitalists/g

shireboy4y ago· 2 in thread

Also, they should totally pay back a portion of revenue to the community and support the repos used to train. That seems like it would be a good PR move if nothing else.

kachhalimbu4y ago

The idea of auto-attribution if copilot surfaces licensed code is best because then it keeps the copilot user honest where the code is coming from and honor the original license.

1 more reply

Aeolun4y ago

> Also, they should totally pay back a portion of revenue to the community and support the repos used to train.

Aren’t they already doubling all Github sponsorship money?

1 more reply

JacobiX4y ago· 2 in thread

bartq4y ago

1 more reply

icoder4y ago

Does anyone even know? Can we even check? What if 1 in a thousand, or one in a million outputs is (very close to) something existing? I find this especially relevant when generating faces.

noisy_boy4y ago· 2 in thread

Say, I want to write a getter method like below:

    String getName() {
        return name;
    }

Let us also assume that this snippet, unsurprisingly, has been in several copyrighted repos that didn't grant Github the right to share this code.

warkdarrior4y ago

Clearly your code could be improved with some `Factory` objects and some dependency injection!

glouwbug4y ago

jarenmf4y ago· 2 in thread

I guess the question is where you draw the line between a derivative work and "learnt by an AI algorithm"

asimpletune4y ago

Who needs a line when there are plenty of obvious examples lifted verbatim?

triknomeister4y ago

If the media copyright industries and their ContentID is anything to go by, it doesn't matter. It's all derivative.

dgb234y ago· 1 in thread

Is it smart enough to:

- respect attribution

- respect copyleft

- respect proprietary licences

- give the user appropriate hints about the above

Or does it just copy code without doing any of this?

spupe4y ago

1 more reply

tremon4y ago· 1 in thread

clircle4y ago

"tool for thee but not for me" <- what does this even mean?

k__4y ago· 1 in thread

Isn't that what Web2 is all about?

Someone creates content for free, and companies monetize it.

WesolyKubeczek4y ago

The real Web3 is companies sue original creator for infringement.

bmacho4y ago· 1 in thread

On a side note, I do believe that short programs or functions should be copyright free by law.

Or we as a community need to create a better bsd, a cc0 for everything.

Almost everything is nontrivial, and almost everything is copyrighted, at least with the pressure to name the original author (BSD, GPL, other major permissive licenses).

It is a major PITA at least for me.

stagas4y ago

What about a law that makes all code available but then requires you to use a portion of your earnings to compensate the people their dependencies you used?

tpoacher4y ago· 1 in thread

Does this mean I can steal stuff if I say I trained an AI to do it for me?

bmacho4y ago

Is cat an AI?

1 more reply

rosmax_13374y ago· 1 in thread

zzo38computer4y ago

janosdebugs4y ago· 1 in thread

namose4y ago

https://twitter.com/mitsuhiko/status/1410886329924194309?s=2...

pen2l4y ago

All human innovation is iterative, and derivative. https://www.youtube.com/watch?v=jcvd5JZkUXY

albertzeyer4y ago

So, how often does it actually happen? Does it happen more often than for a human? Does anyone actually have numbers on this?

So, for real use cases in the wild, where you write some own real novel code, how often would it suggest some copyrighted code? And how often would a human?

I have used Copilot the last months and I have never ever seen such a case (I can be pretty sure because all the identifier names are really unique, and the code was very custom).

Ciantic4y ago

captainbland4y ago

I suppose it should also generate appropriate copyright notices to satisfy many open licenses. I'd be surprised if copilot could actually link back to the original code like that, though.

habibur4y ago

1 more reply

GuB-424y ago

> Copilot just sells code other people wrote

The value added by Copilot is that they sell you the lines "code other people wrote" you want out of billions.

floor_4y ago

I started self hosting when Microsoft bought github and with this mass theft of copyrighted material and then reselling it for money I'm even more happy with my decision.

rictic4y ago

Copilot very rarely copies code verbatum, and when it does it's very short snippets. When Oracle sued Google over allegedly copying short and fairly trivial snippets of code they were justly derided.

nathias4y ago

Copilot is a new way for corporations to break copyright while enforcing it for everyone else, this will be the first big use for AI when other corpos follow.

Havoc4y ago

0x_rs4y ago

thewoolleyman4y ago

Artificial Intelligence is causing us to revisit the difference between free as in beer and free as in speech (https://en.wikipedia.org/wiki/Gratis_versus_libre).

It is putting a new spin on some traditional Open Source Lessons (https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#L...).

People share and reuse snippets of unattributed snippets of MIT-licensed and GPL-licensed code on the internet all the time, StackOverflow, etc.

StackOverflow is profiting from that activity indirectly by facilitating it. They profit passively through ad revenue, and actively through the Teams subscription offering.

But nobody seem too upset about that.

How is an AI which facilitates the same code sharing fundamentally any different? Because it’s scraping it itself, rather than humans contributing it?

Seems like a tenuous argument at best.

mullikine4y ago

https://huggingface.co/spaces/mullikine/ilambda

Language models are able to 'steal' the linguistic meaning-making 'essence' of the software, by modelling:

- How the software is used (mimicing its function) - external meaning

- How functions are 'inspired' - internal meaning (reflection)

https://github.com/semiosis/imaginary-programming-thesis

The models themselves should be clear about where the data came from. However, this is only possible in a fair world which we do not live in. Compromise must be made to protect national interests.

Generative models are license blind and there's very little that could be done to prevent progress. Like what the invention of the camera has done for art.

Large language models including Codex are a transformative technology.

Bi-directional fair-use is probably the best result we can hope for.

So long as Microsoft and OpenAI are not selling back usage of the model to the open-source community, I think it's OK, though it's the bare minimum obligation.

iptq4y ago

> the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time

It's insane how vague this is. Is Copilot a "Service"? Sure, by its definition:

> The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.

aetherspawn4y ago

Copilot is a fancy pattern bot.

It doesn’t think, and it doesn’t create new IP.

It is like making a picture out of small snippets of a thousand other pictures, and then selling it.. clearly not OK. You still ripped off the original artists.

Or like plagiarising 100 of your class mates’ assignments. Are you less guilty because you went to the effort to steal just a few sentences from each?

A criminal who steals a cent from every account at the bank is a more sophisticated thief than someone who holds up a petrol servo.

If Copilot doesn’t create new IP (it doesn’t; we established this), then it uses existing IP. And in that case it is no different to any of the three analogies above.

maxbaines4y ago

Initially not thought about co-pilot and other ai generators this way, but now I have I’m finding it hard to ignore.

madrox4y ago

There's going to be some big cases here. It's going to end up in the Supreme Court sooner or later, and if it were to go there today I think I know what they'd say.

tsujp4y ago

Copilot produces verbatim GPL'd code. It's also a closed box.

Source: https://twitter.com/mitsuhiko/status/1410886329924194309

ewalk1534y ago

If the portion of code that Copilot lifts is the "heart" of the original work, that would be much less likely to be considered fair use[1], regardless of the length.

> For example, it would probably not be a fair use to copy the opening guitar riff and the words “I can’t get no satisfaction” from the song “Satisfaction.”

I wonder how this could be integrated into the system?

[1] https://fairuse.stanford.edu/overview/fair-use/four-factors/...

pornel4y ago

Tough pill to swallow. Microsoft's actions don't seem fair, but fighting them with copyright could weaken fair use:

https://felixreda.eu/2021/07/github-copilot-is-not-infringin...

stakkur4y ago

At every turn, in every instance, for decades, all stories involving Microsoft end in "...and then Microsoft fucked people over." I've witnessed this firsthand since the 80s.

williamcotton4y ago

Should the snippets that Copilot is regurgitating be considered for copyright in the first place?

It seems akin to trying to copyright a certain drum pattern or chord progression.

There is of course a surface level similarity in that a corporation is profiting from IP in the public domain but the devil is in the details.

sirsinsalot4y ago

Jaron Lanier's book "Who Owns the Future?" Is all about AI and compensating those that input in training these very valuable models.

I highly recommend everyone read it.

BiteCode_dev4y ago

So even without using someone else code, just the pattern understanding and the production of simple boiler plate code is great.

powerapple4y ago

1 more reply

Aeolun4y ago

> what github / microsoft is counting on here is that open source developers do not have enough collective power to do anything to stop this

I think it much more likely that they count on everyone liking it way too much to give a shit about their MIT code not being attributed correctly.

I certainly don’t. MIT just seems like the most convenient license for people that need licenses (corporations?), so that is what I use.

vbezhenar4y ago

I don't think that's bad though. Code sharing is good for overall productivity.

c01n4y ago

capableweb4y ago

If GitHub could guarantee that the code Copilot had ingested was only made with OSS licenses, then I don't see what the problem is.

5 more replies

eline434y ago

'I don't agree with having an AI trained on/with my data.'

IMHO, all other problems with copilot stem from this.

zokier4y ago

dgb234y ago

Reading many of the comments here I feel like one important thing is being left out that is not related to legal, but to social issues:

Who is on the side of open source? Where are the big, powerful institutions and companies that deeply care about authors and communities providing free software that so many of us rely on?

andrelaszlo4y ago

Added two characters to the prompt :P

pabs34y ago

I wonder if FOSS folks could copyleft originally public/leaked but proprietary code using CoPilot.

nl4y ago

This isn't how a language model works.

It's SO frustrating that even on HN people still fall for this naive and incorrect analysis. Pasting bits I've said before on this topic:

Language models do not work like this. They can copy content but usually that's for something like the GPL language text.

Generally they work on a character by character basis predicting what is the most likely character to appear next.

This very rarely results in copying text, and almost never rare text.

Mechanically it has learnt both syntax of language and how concepts relate. So when it starts generating it makes sentence that are syntactically valid but also make sense in terms of concepts.

2 more replies

olalonde4y ago

1 more reply

stefanos824y ago

Seems like my original questions [1] are more relevant than ever!

[1] https://news.ycombinator.com/item?id=27677598

tiborsaas4y ago

MrDoob has an excellent point about this:

https://twitter.com/mrdoob/status/1539740854956412929

lfrigodesouza4y ago

It's as the saying go, "when a product is free to use, the real product is actually you". In this case, our code is the product. Just considering now on swapping to another git provider...

oytis4y ago

thih94y ago

Is github copilot using private repositories for the learning process?

If yes, how do they mitigate the risk of exposing private data when something is quoted verbatim?

If not, then why are repos with non permissive licenses ok?

sirsinsalot4y ago

Beware geeks with gifts. This is Microsoft. The question isn't "is it good?" but "Why are Microsoft offering it and how is it undermining everyone else?"

1 more reply

mawadev4y ago

What stops me from re-uploading copyrighted source, where I remove the notices and push it with an MIT license? If such a data set has been trained with, how do you get it out?

LeonTheremin4y ago

And social media sells ideas other people thought.

Copilot is limited to public code now, but it may easily be trained on non-public code - albeit this probably won't be for sale to the public.

FeepingCreature4y ago

All I can think of is Steve Yegge [1]: "They have no right to do this. Open source does not mean the source is somehow 'open'."

[1] http://steve-yegge.blogspot.com/2010/07/wikileaks-to-leak-50...

1 more reply

iLoveOncall4y ago

Github Copilot is selling code other people wrote as much as the author of this thread is profiting from words other people invented.

Absolute nonsense.

1 more reply

presentation4y ago

Google just sells content other people wrote.

AtNightWeCode4y ago

Copiliot will be that bandmate that plays a new riff and leave you wondering about where it was borrowed from.

acuozzo4y ago

This is, in part, why I will continue to use the original 4-clause BSD license for the code I write.

blitz_skull4y ago

Man, people really do be angry that the public code they put on a public platform is being used publicly.

Wild.

1 more reply

boomer_joe4y ago

We need a licence that forbids use in ML and the people willing to sue github for it ASAP.

1 more reply

shahar2k4y ago

and Dalle2 sells art other people created

(I'm actually not being sarcastic, I think there needs to be some sort of pipeline for compensating the artists who are used to train these models

fimdomeio4y ago

what AI is showing is the fuzzy line between creating and copying. The truth is they are both always present in everything we do, we've just been trying to hide it.

So it should be as simple as if you're using other people's content for your own profit you should properly compensate them.

Or we could just abolish copyright law and assume that everything humans create emanates from culture so its always collectively built and everything should be open source.

Or we just do the same we've been doing. Create even more complex laws trying to define this fuzzy line in a way that companies can keep profiting from it a lot more than individuals.

marstall4y ago

most of the code I write is glue sticking together 8 proprietary systems nobody's ever heard of. how is copilot gonna help me with that?

tiku4y ago

I'm using it for a day now and i'm really impressed. It is so aware of stuff in old code, that it is scary. I'm working in an old application with Zend Framework.

whywhywhywhy4y ago

Same deal for Dall-e if they ever productize it.

pvaldes4y ago

Each day sounding more as Zopilote, it seems.

sytelus4y ago

Google just sells content other people wrote.

SMAAART4y ago

Once again Innovation challenges IP.

HeavyStorm4y ago

So much bullshit my head hurts.

lysecret4y ago

Don't we all.

honkler4y ago

license issues will save many thousand jobs.

amelius4y ago

"Good artists copy. Great artists steal."

abdulhaq4y ago

That's like saying a plumber just sells parts that other people made

2 more replies

janandonly4y ago

Isn't every programmer in history (except the gall who invents her own language and writes all her own code) simply an archeologist for other people's work?

We all Duck/Google for code anyway. Why not admit and make it easier?

2 more replies

danamit4y ago

I do not see Copilot as useful anyway.

Separo4y ago

GitHub provides the repo hosting and tools for free on public projects. I'm happy with this deal.

1 more reply

spupe4y ago

I disagree. Copilot is selling content-aware code suggestions, which is a result of code that other people wrote in their platform, and which in no way affects the work of these people.

lakomen4y ago

I don't understand what's going on there.

I don't use github. Can someone explain what the author means?

Edit: in detail

3 more replies

skc4y ago

I get the feeling this entire debate would have been non-existent had this been a Jetbrains product instead.

3 more replies

bborud4y ago

That's not a bad argument, but it is unsatisfactory because it means that at some point someone has to make a judgement on how much you can borrow.

j / k navigate · click thread line to collapse