1. sue GitHub (they will probably win since they can afford the best lawyers in the planet)
2. don't use GitHub for future projects. Remove all my current repositories (does this guarantee GitHub can't use my code without attribution? My code is probably still in their Code Vault)
3. use GitHub but use a license that explicitly denies usage of my code by Copilot (GitHub probably would probably just say "you can't do that, please just delete your account")
4. use GitHub but ask them to pay me for using my code without attribution
Option 1 is a waste of money and time for me (I will never win). Option 2 is fine, but I'm not sure what could stop GitHub scrapping public repos hosted in, let's say, GitLab. Option 3 would be the ideal case for me. Option 4: would be nice, but probably won't happen.
It would be introduced into your code by your own developers it's like some kind of meta-supplychain attack.
I figured they would rapidly implement tech to detect that sort of thing and we’d wind up in an arms race I would ultimately lose.
1) It's not your code that is proposed by copilot, but a mix of thousands and thousands codes, that can not be linked to one specific person. It's like you take 1000 portrait photos and make a new one. How can you prove your photo has been used?
2) How what's done by Github is causing you any damage, directly or indirectly? If you want to sue someone, you'd better prove any damage to a judge.
3) Because of all of this yelling, I'm pretty sure Github will add a per-repository option to disable code scraping. But I still don't understand what you'll win.
Unfortunately, sometimes the "generated" snippet is just a verbatim copy of code someone else has authored.
And it's done without attribution or following other license requirements.
Github is likely only using appropriately licensed code.
For example see https://blog.adobe.com/en/publish/2020/02/27/copyrights-in-t...
Several cases – most notably the litigation surrounding Google Books – suggest that using copyrighted works for the non-expressive purpose of training AI models amounts to fair use.
So Copilot itself is not infringing when it trains these models. Individual people using Copilot are probably safe as well since small accidental copyright infringements are usually okay, but perhaps some cases it would run into trouble.
You can tell Copilot is using code under fair use, not according to a license, because almost all licenses require you to maintain the license and attribution and Copilot doesn't do that.
So yeah, sorry for those who believe that it's going to return some "purely algorithmic" code and won't just randomly copying code.
Also, https://twitter.com/mitsuhiko/status/1410886329924194309
Wow, just wow... this has to be regulated FAST.
Jaron Lanier's book Who Owns The Future? discussed this idea; maybe it's worth revisiting now.
I've never been comfortable with the idea of trusting IP in places like Github, Google, etc. Although I find Copilot amazing, I find the use of user's code as distasteful as I do understandable.
source: the faq at the bottom of https://copilot.github.com/
GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. It has been trained on a selection of English language and source code from publicly available sources, including code in public repositories on GitHub.
Where are you going to go? Self host, sure. Else don’t assume that another code repository won’t do the same thing.
Also, how do you know that your code was used? I don’t know if GitHub has published a list of the repositories that were used. Or have they?
The article being discussed at https://news.ycombinator.com/item?id=27723710 indicates that they may be adding attribution soon.
Not sure if this simplification crosses some threshold