Where did it come from then? And what license did the original have?
> and is in hundreds of repositories - many with permissive licenses like WTFPL and many including the same comments.
If the original was GPL or proprietary, then all of this copies with different licenses are violating the license of the original. Just because it exists everywhere does not mean Copilot can use it without violating the original license.
> It's not really a large amount of material, either.
No, but I would argue that it is enough for copyright because it is original.
> GitHub claims they haven't found any "recitations" that appeared fewer than 10 times in the training data.
Key word is "claim". We can test that claim. Or rather, you can, if you have access to Copilot, you can try the test I suggested at https://news.ycombinator.com/item?id=28018816 . Let me know the result. Even better, try it with:
// Computes the index of them item.
map_index(
because what's in that function is definitely copyrightable.> With the exceptions mentioned above, what you get back from asking for more code won't just be more and more of a particular work. Realistically I think you'd be able to get significantly more from Google Books.
That can only be tested with time. Or with the test I gave above.
I think that with time, more and more examples will appear until it is clear that Copilot is a problem.
Nevertheless, a court somewhere (I think South Africa) recently ruled that an AI cannot be an inventor. If an AI cannot be an inventor, why can it hold copyright? And if it can't hold copyright, I argue it's infringing.
Again, only time will tell which of us is correct according to the courts, but I intend to demonstrate to them that I am.