Is CoPilot just trained on OSS, or on private repos too?
There are GPL repositories which force you to open your code, which is one aspect, and there are "source available" repositories, which allows you to see the code, but forbids everything else.
There are a lot of blurry areas about this, and in my opinion, an AI learns like a human is not a solid basis for fair use.
On the other hand, if private repositories are crawled too, this would be very, very bad.
We just talked this with a couple of friends. I always cite what I got from where (it's just two occasions, but it's not zero), and always respect their licenses.
I'm worried about both ways of the permeation: GPL to closed and closed to open. Open source is a widely misunderstood concept and people (and companies) are using that misunderstanding to validate their blanket options. That's wrong on so many (legal to ethical, and everything in between) levels.
Emulator writers are afraid to read leaked console code, because any resemblance of their code to it means destruction of years (or decades) of reverse engineering and clean room development done in that domain. If code licensing is that important and crucial, why a court tested license (e.g. GPL) is so worthless? Is this fair, again in the same cross-section (legal to ethical)?
There's a lot to be discussed, and a lot of ideas to be re-learnt here. Open Source (or precisely Free / Copylefted software) doesn't mean free for all. We need to understand that.