Ask HN: Can I charge GitHub for using my code in Copilot?

39 pointsingvul4y ago32 comments

Since Copilot is regurgitating licensed code without attribution, I pondering many options:

1. sue GitHub (they will probably win since they can afford the best lawyers in the planet)

2. don't use GitHub for future projects. Remove all my current repositories (does this guarantee GitHub can't use my code without attribution? My code is probably still in their Code Vault)

3. use GitHub but use a license that explicitly denies usage of my code by Copilot (GitHub probably would probably just say "you can't do that, please just delete your account")

4. use GitHub but ask them to pay me for using my code without attribution

Option 1 is a waste of money and time for me (I will never win). Option 2 is fine, but I'm not sure what could stop GitHub scrapping public repos hosted in, let's say, GitLab. Option 3 would be the ideal case for me. Option 4: would be nice, but probably won't happen.

Ask HN: Can I charge GitHub for using my code in Copilot?

39 pointsingvul4y ago32 comments

Since Copilot is regurgitating licensed code without attribution, I pondering many options:

1. sue GitHub (they will probably win since they can afford the best lawyers in the planet)

2. don't use GitHub for future projects. Remove all my current repositories (does this guarantee GitHub can't use my code without attribution? My code is probably still in their Code Vault)

3. use GitHub but use a license that explicitly denies usage of my code by Copilot (GitHub probably would probably just say "you can't do that, please just delete your account")

4. use GitHub but ask them to pay me for using my code without attribution

32 comments

31 comments · 14 top-level

tyingq4y ago· 4 in thread

There could be an "Option 5" where you find a way to get Copilot to ingest large amounts of broken, dangerous, etc, code.

etothepii4y ago

I wonder how a cyber insurance policy would respond to this.

It would be introduced into your code by your own developers it's like some kind of meta-supplychain attack.

anonytrary4y ago

This reminds me of when #DeleteFacebook was trending and some people suggested that rather than deleting all of your posts, comments, etc., you instead edit them with fake data about yourself and gibberish so that you throw off anyone mining your data (Facebook, Cambridge Analytica et al.).

emerged4y ago

I’ve long wanted to implement a browser plug-in which uses Facebook as a data store but posts everything encrypted so it looks like garbage from their perspective.

I figured they would rapidly implement tech to detect that sort of thing and we’d wind up in an arms race I would ultimately lose.

dkersten4y ago

Copilot Honeypot, create a ton of repositories with large amounts of auto-generated bad code.

bluewalt4y ago· 3 in thread

I don't understand all the yelling about this Copilot stuff.

1) It's not your code that is proposed by copilot, but a mix of thousands and thousands codes, that can not be linked to one specific person. It's like you take 1000 portrait photos and make a new one. How can you prove your photo has been used?

2) How what's done by Github is causing you any damage, directly or indirectly? If you want to sue someone, you'd better prove any damage to a judge.

3) Because of all of this yelling, I'm pretty sure Github will add a per-repository option to disable code scraping. But I still don't understand what you'll win.

tompazourek4y ago

> 1) It's not your code that is proposed by copilot, but a mix of thousands and thousands codes, that can not be linked to one specific person. It's like you take 1000 portrait photos and make a new one. How can you prove your photo has been used?

Unfortunately, sometimes the "generated" snippet is just a verbatim copy of code someone else has authored.

And it's done without attribution or following other license requirements.

drivingmenuts4y ago

If code that is covered by the GPL is mixed with non-GPL by Copilot, what happens then? What license is the Copiloted code covered under?

visarga4y ago

It's not just training data going into the result, the user is responsible for prompting and picking correct formulations, or fixing or discarding. So it's still semi-manual work. It's just Copilot, not yet full Pilot. The copyright should belong to the human assuming they could have written the same code if they had access to search the internet for help. So it works like a search engine when it's replicating verbatim. Should be considered similar to coding with access to internet for answers.

1 more reply

Something12344y ago· 3 in thread

I'm feeling like we need a new gpl. One that prohibits training machine learning models that regurgitate code directly.

rw_grim4y ago

We don't need a new GPL.. ML creating a derivative work is still a derivative work. So if said derivative work ends up in a non gpl compatible code base, that code base is then violating the license. I would assume that GitHub/Microsoft would have had the forethought for this, but who knows. And in any case, I am not a lawyer...

MrDresden4y ago

That is what is supposed to happen. However, reality is different. Copilot keeps being shown to simply regurgitate whole code snippets without modification (and more importantly without attribution).

opan4y ago

I worry that violates freedom 0, although maybe you could add something about the "use" applying to humans. But then what about bots and scripts running automatically?

lacker4y ago· 3 in thread

No, you cannot charge GitHub for using your code in Copilot. It's the same as you cannot charge all humans that read your code for reading it. You are free to make your code private, and charge people who want to read it. But you can't make your code public and hope to charge money to people who do "fair use" things with it. It doesn't matter how your code is licensed.

jimmygrapes4y ago

Kind of a tangent but this made me think: is there a market for (or reason why it isn't common to) selling access to source code? I assume the primary barrier is the assumption of "piracy" or redistributing the code once somebody has purchased access, but is there anything else I'm missing?

booi4y ago

… what. Usage restrictions, especially for profit, can absolutely be in your code’s license. I don’t see how regurgitating code verbatim can be considered fair use.

Github is likely only using appropriately licensed code.

lacker4y ago

You can restrict some activities, but not all. In particular you can't restrict using code as an AI training set. To be fair the law isn't totally clear here though.

For example see https://blog.adobe.com/en/publish/2020/02/27/copyrights-in-t...

Several cases – most notably the litigation surrounding Google Books – suggest that using copyrighted works for the non-expressive purpose of training AI models amounts to fair use.

So Copilot itself is not infringing when it trains these models. Individual people using Copilot are probably safe as well since small accidental copyright infringements are usually okay, but perhaps some cases it would run into trouble.

You can tell Copilot is using code under fair use, not according to a license, because almost all licenses require you to maintain the license and attribution and Copilot doesn't do that.

blhack4y ago· 1 in thread

Can you charge me if I read your open source code, and then write my own?

meowface4y ago

I think they potentially could, depending on the license and how similar your code is to theirs. (If by "charge" you mean "sue/settle with".)

despera4y ago· 1 in thread

https://twitter.com/pragmaticml/status/1411113232048218119

So yeah, sorry for those who believe that it's going to return some "purely algorithmic" code and won't just randomly copying code.

Also, https://twitter.com/mitsuhiko/status/1410886329924194309

Wow, just wow... this has to be regulated FAST.

MrDresden4y ago

I am having a hard time imagining how anyone with half a brain could have genuinely believed that copilot was ready for production rather than being the pr/legal nightmare it is.

ingvulOP4y ago· 1 in thread

To make this clear: I am a nobody and my code sucks. Now, for a moment, imagine the same question is stated but by developers who are actually great contributors to the open source community but can't (or don't want to) openly ask these kind of questions regarding Copilot (this is actually why I'm posting this, one of these great contributors is a close friend of mine).

wmf4y ago

Plenty of people are openly discussing it on Twitter.

Jaron Lanier's book Who Owns The Future? discussed this idea; maybe it's worth revisiting now.

eplanit4y ago· 1 in thread

It's not about Copilot, per se -- but it's clear that most users probably didn't understand when then signed up that they're giving Github (now Microsoft, let's not forget) the right to use their code, at all.

I've never been comfortable with the idea of trusting IP in places like Github, Google, etc. Although I find Copilot amazing, I find the use of user's code as distasteful as I do understandable.

lacker4y ago

Copilot is actually not trained just on GitHub data, and the model is owned by OpenAI, not Microsoft. It relies on fair use, not any particular terms of service.

source: the faq at the bottom of https://copilot.github.com/

GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. It has been trained on a selection of English language and source code from publicly available sources, including code in public repositories on GitHub.

yumraj4y ago

> 2. don't use GitHub for future projects.

Where are you going to go? Self host, sure. Else don’t assume that another code repository won’t do the same thing.

Also, how do you know that your code was used? I don’t know if GitHub has published a list of the repositories that were used. Or have they?

onionisafruit4y ago

> regurgitating licensed code without attribution

The article being discussed at https://news.ycombinator.com/item?id=27723710 indicates that they may be adding attribution soon.

thdxr4y ago

If I had to make an argument for Github I'd suggest that Copilot is just an advanced search feature. It can be seen as simplifying the workflow of using Github's search and then copying the results

Not sure if this simplification crosses some threshold

IfOnlyYouKnew4y ago

It is unlikely for a generic, short segment of source code to fulfil the requirement for copyright that it comprises an actual "work".

CodeWriter234y ago

I think since you consented to allowing GitHub to “analyze” your source, the answer is likely “no”. I’m not a lawyer and you might want to check with someone who is.

ykevinator34y ago

What is copilot?

j / k navigate · click thread line to collapse

32 comments

31 comments · 14 top-level

tyingq4y ago· 4 in thread

There could be an "Option 5" where you find a way to get Copilot to ingest large amounts of broken, dangerous, etc, code.

etothepii4y ago

I wonder how a cyber insurance policy would respond to this.

It would be introduced into your code by your own developers it's like some kind of meta-supplychain attack.

anonytrary4y ago

emerged4y ago

I’ve long wanted to implement a browser plug-in which uses Facebook as a data store but posts everything encrypted so it looks like garbage from their perspective.

I figured they would rapidly implement tech to detect that sort of thing and we’d wind up in an arms race I would ultimately lose.

dkersten4y ago

Copilot Honeypot, create a ton of repositories with large amounts of auto-generated bad code.

bluewalt4y ago· 3 in thread

I don't understand all the yelling about this Copilot stuff.

2) How what's done by Github is causing you any damage, directly or indirectly? If you want to sue someone, you'd better prove any damage to a judge.

3) Because of all of this yelling, I'm pretty sure Github will add a per-repository option to disable code scraping. But I still don't understand what you'll win.

tompazourek4y ago

Unfortunately, sometimes the "generated" snippet is just a verbatim copy of code someone else has authored.

And it's done without attribution or following other license requirements.

drivingmenuts4y ago

If code that is covered by the GPL is mixed with non-GPL by Copilot, what happens then? What license is the Copiloted code covered under?

visarga4y ago

1 more reply

Something12344y ago· 3 in thread

I'm feeling like we need a new gpl. One that prohibits training machine learning models that regurgitate code directly.

rw_grim4y ago

MrDresden4y ago

That is what is supposed to happen. However, reality is different. Copilot keeps being shown to simply regurgitate whole code snippets without modification (and more importantly without attribution).

opan4y ago

I worry that violates freedom 0, although maybe you could add something about the "use" applying to humans. But then what about bots and scripts running automatically?

lacker4y ago· 3 in thread

jimmygrapes4y ago

booi4y ago

… what. Usage restrictions, especially for profit, can absolutely be in your code’s license. I don’t see how regurgitating code verbatim can be considered fair use.

Github is likely only using appropriately licensed code.

lacker4y ago

You can restrict some activities, but not all. In particular you can't restrict using code as an AI training set. To be fair the law isn't totally clear here though.

For example see https://blog.adobe.com/en/publish/2020/02/27/copyrights-in-t...

Several cases – most notably the litigation surrounding Google Books – suggest that using copyrighted works for the non-expressive purpose of training AI models amounts to fair use.

You can tell Copilot is using code under fair use, not according to a license, because almost all licenses require you to maintain the license and attribution and Copilot doesn't do that.

blhack4y ago· 1 in thread

Can you charge me if I read your open source code, and then write my own?

meowface4y ago

I think they potentially could, depending on the license and how similar your code is to theirs. (If by "charge" you mean "sue/settle with".)

despera4y ago· 1 in thread

https://twitter.com/pragmaticml/status/1411113232048218119

So yeah, sorry for those who believe that it's going to return some "purely algorithmic" code and won't just randomly copying code.

Also, https://twitter.com/mitsuhiko/status/1410886329924194309

Wow, just wow... this has to be regulated FAST.

MrDresden4y ago

I am having a hard time imagining how anyone with half a brain could have genuinely believed that copilot was ready for production rather than being the pr/legal nightmare it is.

ingvulOP4y ago· 1 in thread

wmf4y ago

Plenty of people are openly discussing it on Twitter.

Jaron Lanier's book Who Owns The Future? discussed this idea; maybe it's worth revisiting now.

eplanit4y ago· 1 in thread

I've never been comfortable with the idea of trusting IP in places like Github, Google, etc. Although I find Copilot amazing, I find the use of user's code as distasteful as I do understandable.

lacker4y ago

Copilot is actually not trained just on GitHub data, and the model is owned by OpenAI, not Microsoft. It relies on fair use, not any particular terms of service.

source: the faq at the bottom of https://copilot.github.com/

yumraj4y ago

> 2. don't use GitHub for future projects.

Where are you going to go? Self host, sure. Else don’t assume that another code repository won’t do the same thing.

Also, how do you know that your code was used? I don’t know if GitHub has published a list of the repositories that were used. Or have they?

onionisafruit4y ago

> regurgitating licensed code without attribution

The article being discussed at https://news.ycombinator.com/item?id=27723710 indicates that they may be adding attribution soon.

thdxr4y ago

If I had to make an argument for Github I'd suggest that Copilot is just an advanced search feature. It can be seen as simplifying the workflow of using Github's search and then copying the results

Not sure if this simplification crosses some threshold

IfOnlyYouKnew4y ago

It is unlikely for a generic, short segment of source code to fulfil the requirement for copyright that it comprises an actual "work".

CodeWriter234y ago

I think since you consented to allowing GitHub to “analyze” your source, the answer is likely “no”. I’m not a lawyer and you might want to check with someone who is.

ykevinator34y ago

What is copilot?

j / k navigate · click thread line to collapse