Is fair use on a massive scale still fair use? Courts generally think so, otherwise Google would have been out of business a long time ago.
Courts have held that it doesn't apply to music, why do you think different rules apply to code?
Yes, I too, and probably many people will do.
> In fact sometimes I copy and paste short passages and then rework them.
This I usually don't unless I check the license first. (Everybody ought to be allowed, but sometimes the license might not be.)
On the other hand, if you copy from private repositories, it quickly gets into the territory of stealing trade secrets.
"Microsoft does it, therefore it must be right" does not a sound argument make.
Also, the open source community has far less leverage to apply pressure to Google than it does to GitHub. We may be able to do something about this.
The whole point of fair use is that the license doesn't matter. You can have a license that says I'm not allowed to use what you wrote for any purpose ever and I can still use it under fair use.
I don't really think this argument passes muster.
It's just automating the copying and pasting (and slight reworking) of boilerplate code that would normally take me much longer to do, especially when I am working with a language I'm less familiar with but is necessary for my stack. I've literally never seen it suggest code that is more or less almost exactly what I would have come up with given a lot more time. In essence, it eliminates tedium- exactly the point of all of programming: Work elimination.
I have two questions:
1. Why have licenses, then?
2. What if I just use leaked sources of closed source software and call it fair use?
The default under copyright law is that any substantial copy is infringement.
A license is a legal document that grants someone permission to use a work that they otherwise would not have had.
However the law also gives its own permissions to use a work - it defines what is unlawful infringement and what is lawful fair use.
The code snippets that copilot generates look more like fair use than infringement. They are small, adapted to the destination context, and usually not direct copies of one source but more of an average of many different sources. And usually the programmer does not keep the suggestion that copilot suggests unmodified - the programmer does their own editing of the snippet afterwards to further tune it to the surrounding context.
2. What if I just use leaked sources of closed source software and call it fair use?
As pointed out upthread, if it the source code is leaked then there may be trade secret protections. The GPL specifically allows the code to be posted online, so by design it is not secret.
The reverse maybe true. I may be GPL'ing a code to prevent a useful algorithm from being buried deep inside a commercial code with an incompatible license. What makes it a "trade secret" level code? I have a 25 line algorithm which is worthy of its own paper. What if I open its reference implementation with AGPLv3+?
I have no problems with you reading the paper, and implementing it. I don't obfuscate my papers, but I put the implementation out with AGPLv3+. You can't use that in a codebase with an incompatible license. I expect and want you respect the license of my implementation.
> The code snippets that copilot generates look more like fair use than infringement. They are small, adapted to the destination context, and usually not direct copies of one source but more of an average of many different sources. And usually the programmer does not keep the suggestion that copilot suggests unmodified - the programmer does their own editing of the snippet afterwards to further tune it to the surrounding context.
Emphasis mine. First, there's no consensus on fair use, yet. Second they may be direct copies of the code. Third, they're remixed with other code pieces, which makes it a derivative work of many code pieces at once, then lastly, programmer re-derives the derived work. Which is clearly a derivative of GPL code, which brings in GPL license with itself (if what copilot derives the code from GPL licensed repositories, which it does).
I have no problem with Copilot as a technology. I have no problems with other licenses, which are not breached when used by Copilot and derived and used. The point which makes my blood boil is copilot using this GPL corpus, and don't admitting it publicly, breaching the terms of GPL en masse, and outright ignoring it. Then feeding this GPL derived code to any and all projects which pay for a copilot membership, and calling it a day.
There are just minor deviances, not relevant to this case, such as how long Disney bullied the countries to protect a work.
Software is usually considered a work. The AI needs to know if has permissions to copy and use the code, and then offer derived work on the proper terms and conditions. copilot doesn't do that. It might copy GPL code into non-GPL code, thus violating the GPL license, thus being an extreme risk.
I do think there are ethical questions around whether it's right for google to digitise physical books without the permission of the authors, and keep them on their servers and make money from them without recompensing the authors. That's something an individual would not get away with doing, so it seems wrong that it's OK for google.
The SCO vs IBM lawsuit was over only a few lines of code, after all.
I cant use a derivative of Mickey Mouse in my product, even if I change his colour and give him a hat, even if these changes were made by an AI. Why would it be different for code? I cab only use Mickey Mouse as fair use if its done for a specific barrow set of proposes (satire, news reporting etc).