undefined | Better HN

0 pointsineedasername4y ago0 comments

Good point; if the code does not have a specified license then standard copyright terms apply and inclusion for commercial use should be verboten. It it's actually open source without commercial restrictions though, I don't see an ethical difference in using the code directly or for an meta analysis driving ML for enhanced code completion.

0 comments

TeMPOraL4y ago

> It it's actually open source without commercial restrictions though, I don't see an ethical difference in using the code directly or for an meta analysis driving ML for enhanced code completion.

Most Open Source code comes with a requirement to carry over license note, which Copilot does not do. Additionally, ethics dictate you attribute the source when copying directly, something the Copilot also doesn't do.

ineedasernameOP4y ago

Are they actually using the source to make a derivative/fork though? If reusing the code in another codebase then definitely attribution would be required. But using it as a dataset seems a bit different-- a grey area. Though I would still agree that the right thing to do would be to have an attribution area, even if it was thousands of entries long. Whether technically required by the license or not, the spirit of these licenses I think would come down on pushing for attribution regardless of the nature of the re-use.

TeMPOraL4y ago

> If reusing the code in another codebase then definitely attribution would be required. But using it as a dataset seems a bit different-- a grey area.

It's already been demonstrated that Copilot - like all tools in the GPT family - frequently output large chunks of their training dataset verbatim. It's not hard to trigger this behavior, even unintentionally. To me, this is much closer to "reusing". But I'm not a lawyer.

It's also worth remembering that there are two parties potentially open to liability here - GitHub, with the way the code was used with the Copilot, and the user, who may be unwittingly including licensed code in their codebase. Given the well-known behavior of the GPT family I mentioned above, it might be hard to argue that Copilot "just chanced" into generating code that's identical to existing, non-public-domain code.

1 more reply

hvis4y ago

Don't forget the usage restrictions, as specified in each individual license.

Taniwha4y ago

And if course GPL would attach where applicable .....

j / k navigate · click thread line to collapse

0 comments

TeMPOraL4y ago

> It it's actually open source without commercial restrictions though, I don't see an ethical difference in using the code directly or for an meta analysis driving ML for enhanced code completion.

ineedasernameOP4y ago

TeMPOraL4y ago

> If reusing the code in another codebase then definitely attribution would be required. But using it as a dataset seems a bit different-- a grey area.

1 more reply

hvis4y ago

Don't forget the usage restrictions, as specified in each individual license.

Taniwha4y ago

And if course GPL would attach where applicable .....

j / k navigate · click thread line to collapse