undefined | Better HN

0 pointsapi4y ago0 comments

What about non-traditional-FOSS licenses? There is a lot of source-available not-OSI-compliant licensed software on GitHub like MongoDB, CockroachDB, etc., and that's clearly proprietary. If this thing is trained on that and generates what amount to snippets of that code then it's clearly violating those licenses.

Then there's private repositories. If they included those in the training data set that's even more actionable.

Personally I think this is software piracy at an absolutely unprecedented scale. Machine learning is just information transfer from the training data into weights in a model, a close relative of lossy data compression. Microsoft is now reselling all its GitHub users' code for profit.

0 comments

1 comments · 1 top-level

Wowfunhappy4y ago

Private repositories weren't included in the training data per-github, only public repos.

j / k navigate · click thread line to collapse