>
If reusing the code in another codebase then definitely attribution would be required. But using it as a dataset seems a bit different-- a grey area.It's already been demonstrated that Copilot - like all tools in the GPT family - frequently output large chunks of their training dataset verbatim. It's not hard to trigger this behavior, even unintentionally. To me, this is much closer to "reusing". But I'm not a lawyer.
It's also worth remembering that there are two parties potentially open to liability here - GitHub, with the way the code was used with the Copilot, and the user, who may be unwittingly including licensed code in their codebase. Given the well-known behavior of the GPT family I mentioned above, it might be hard to argue that Copilot "just chanced" into generating code that's identical to existing, non-public-domain code.