Copilot's API is surfacing snippets of work without licensing information attached alongside. It can be shown in discovery that Copilot does access the origin work.
The sooner this is slapped down, the sooner we can avoid addressing the even more troubling question that exists today: is someone who used Copilot to throw together a bunch of code infringing copyright of works where those portions originate?
This is a complex problem with no satisfying conclusions... how could one be violating copyright if they never accessed the 'copied' work to copy? Copyrights aren't patents. Infringement requires copying.
Using Copilot launders the user's awareness of the origin works, yet making the Copilot users liable for widespread "accidental" copying would be troubling.
That you got the code from and entity that stole it somewhere else doesn't really matter. Generative models should respect copyright for their sources, and using a generative model to create new works that you intend to claim copyright on is stupid: someone may well show up one day with ironclad proof that you used their code without permission.
https://decoded.legal/blog/2021/06/github-copilot-initial-th...
https://fossa.com/blog/analyzing-legal-implications-github-c...
https://felixreda.eu/2021/07/github-copilot-is-not-infringin...
Also a reminder that outside the copilot debate, the online rights movement has largely been pushing for scraping, deep linking and transforming scrapped data to not be considered copyright infringement, regardless of any TOS on the site being scraped.
To me, co pilot is a exactly that, a scraper that has scraped public websites and is now presenting me the scraped data in an alternative and often transformed form. It’s my responsibility as a developer to ensure that my released product complies with applicable copyright law, but copilot and the use thereof is not in and of itself copyright infringement.
That a tool can be used to create infringing work or infringe on copyright in general is no more a valid argument against co pilot than it is against CD burners, de-drm tools, vcrs, kodi or plex, scanners or any number of day to day items that have the ability to infringe copyright if the user uses it for that purpose.
Microsoft has been super litigious in the past when it came to copyright violation starting all the way back with Bill Gates' letter in Byte magazine about those pesky pirates. To see them do this makes pirating MS software fair game from here on. They could have asked nicely, instead they just took.
It is. Laws are adapted based on widespread technological capabilities and progress.
As an example, if it is easy to create real voice or signature using AI models - they should no longer be considered effective evidence for contractual reason instead of enforcing that it is illegal to forge it. That is not going to work.
Past shouldn't dictate what we allow tomorrow.
Does the same apply to a human? Do we now define copyright violation differently for computers? I don‘t know the perfect answer here. But I‘m not so sure we should have standards that change depending on if a program is doing it or a human is doing it. Perhaps a bad standard to begin with.
I do tend to learn towards thinking „company uses publicly available, open source code in product“ is somewhat of a nothing-burger though.
EDIT: After more reading on the subject, I'm willing to accept that copyright infringement is unlikely here. This link [1] was the one I found most convincing.
However, I would still shift the goalposts and look at this ethically, and I still think it's wrong that Microsoft is profiting from code with licences like GPLv3. This is a whole other topic, though.
[1] https://www.technollama.co.uk/is-githubs-copilot-potentially...