undefined | Better HN

0 pointspaulryanrogers4y ago0 comments

But publicly accessible doesn't mean public domain. Microsoft has shared even some of their private code with others like governments. No doubt with strict licenses which they expect to be honored. AGPL and other licenses on publicly accessible code still matter.

0 comments

phire4y ago

Microsoft's apparent legal opinion is that training an AI on the data is the same as reading it, and doesn't require a license.

That as long as they have the right to read the data, they have the right to train an AI on it. The fact that the code is available under an open source license is irreverent to them.

As for why they didn't use their own private code to train their AI, I suspect it was more of a non-malicious: "we don't need to, this public github repo dataset is big enough for now"

Personally, I think Microsoft should double down on this legal stance. Train the AI on all their internal code. And train it on any code they have licensed from other companies too.

pronik4y ago

I remember when some Windows code has been leaked, people explicitely skipped reading it to avoid getting sued if they were to work on Linux kernel or Wine in the future. Reading code can most certainly lead to a copyright breach and Microsoft of all corporates should know this.

jcelerier4y ago

> Microsoft's apparent legal opinion is that training an AI on the data is the same as reading it, and doesn't require a license.

How is that conciled with the fact that a person that read copyrighted code (not even the original source code, a mere decompiled version of it !) is forbidden to reimplement it directly:

https://www.computerworld.com/article/2585652/reverse-engine...

zarzavat4y ago

Clean room reimplementation is a way to prevent court cases, it's not a legal requirement.

If a company copies a competitors product then the chance of getting sued is very high. If they can show that, in fact, there was zero copying at all, then they can get the case dismissed and save great legal expense.

paulryanrogersOP4y ago

Training is one thing. Regurgitating chunks verbatim without attribution is another.

zarzavat4y ago

In general taking short excerpts of a copyrighted work is legal and is not infringement.

heavyset_go4y ago

Try lifting a riff from a Metallica song and see how far you can get selling it commerically.

Also, Copilot is copying much more than short excerpts, going as far as to reproduce large amounts of copyrighted code verbatim[1].

[1] https://twitter.com/mitsuhiko/status/1410886329924194309

1 more reply

j / k navigate · click thread line to collapse

0 comments

phire4y ago

Microsoft's apparent legal opinion is that training an AI on the data is the same as reading it, and doesn't require a license.

That as long as they have the right to read the data, they have the right to train an AI on it. The fact that the code is available under an open source license is irreverent to them.

As for why they didn't use their own private code to train their AI, I suspect it was more of a non-malicious: "we don't need to, this public github repo dataset is big enough for now"

Personally, I think Microsoft should double down on this legal stance. Train the AI on all their internal code. And train it on any code they have licensed from other companies too.

pronik4y ago

jcelerier4y ago

> Microsoft's apparent legal opinion is that training an AI on the data is the same as reading it, and doesn't require a license.

How is that conciled with the fact that a person that read copyrighted code (not even the original source code, a mere decompiled version of it !) is forbidden to reimplement it directly:

https://www.computerworld.com/article/2585652/reverse-engine...

zarzavat4y ago

Clean room reimplementation is a way to prevent court cases, it's not a legal requirement.

paulryanrogersOP4y ago

Training is one thing. Regurgitating chunks verbatim without attribution is another.

zarzavat4y ago

In general taking short excerpts of a copyrighted work is legal and is not infringement.

heavyset_go4y ago

Try lifting a riff from a Metallica song and see how far you can get selling it commerically.

Also, Copilot is copying much more than short excerpts, going as far as to reproduce large amounts of copyrighted code verbatim[1].

[1] https://twitter.com/mitsuhiko/status/1410886329924194309

1 more reply

j / k navigate · click thread line to collapse