undefined | Better HN

0 pointstyingq4y ago0 comments

You would have to not host your code publicly either, right?

0 comments

Merely hosting your code publicly seems like it wouldn't give GitHub the right to train AI models on it. You could even say it's against your terms of use. And to do it, they would have to go out of their way to find your repo on the web and clone it—unlikely.

My impression (NOT A LAWYER) is that by hosting your code in a public repo on GitHub, you agree to their terms and give them the right to "read" your code including training AI models on it. Or at least that's what they're banking on.

Go host on Sourcehut or self-host with Gitea, and I would think it unlikely (but not impossible) that any big company would use your code to train their AI.

hnfong4y ago

It's not even very clear whether training an AI on OSS code is violation of those licenses. So unless you make your code public clearly under a proprietary license that clearly rejects such use, you can't really prevent people from doing that anyway.

Just imagine, there's really nothing preventing people from scraping your blog to train their natural language processing AI or whatever, why would code be any different? Even if you put up a big sign saying you don't consent to having your data ingested by a neural network, I doubt it will get noticed anyway...

People have been taking large OSS codebases (eg. Linux kernel) for various statistical analyses. AI is just doing the same thing in a more sophisticated manner.

tyingqOP4y ago

I bet if I trained an AI on some vocalist and released an album I'd get some legal mayhem. I do concede it might go differently for code, but none of these issues are crystal clear for me.

toastal4y ago

I wish it were easier to convince projects I like and want to help migrate for the same reason. Committing to their repos does not put me in the clear--including mere mirrors.

sobellian4y ago

I would think that training a NN falls squarely in fair use.

nextlevelwizard4y ago

You can always host is with license that doesn't allow reuse or something

tyingqOP4y ago

GitHub mentions that they don't currently look at the license before trawling code.

https://twitter.com/NoraDotCodes/status/1412741339771461635

There's also other references that GitHub public repos weren't the only source. They trawl other publicly readable code.

nextlevelwizard4y ago

You can sue them for using your code if they break the licensing agreement. Contact EFF and they'll set you up with a lawyer.

j / k navigate · click thread line to collapse

0 comments

HomeDeLaPot4y ago

Go host on Sourcehut or self-host with Gitea, and I would think it unlikely (but not impossible) that any big company would use your code to train their AI.

hnfong4y ago

People have been taking large OSS codebases (eg. Linux kernel) for various statistical analyses. AI is just doing the same thing in a more sophisticated manner.

tyingqOP4y ago

I bet if I trained an AI on some vocalist and released an album I'd get some legal mayhem. I do concede it might go differently for code, but none of these issues are crystal clear for me.

toastal4y ago

I wish it were easier to convince projects I like and want to help migrate for the same reason. Committing to their repos does not put me in the clear--including mere mirrors.

sobellian4y ago

I would think that training a NN falls squarely in fair use.

nextlevelwizard4y ago

You can always host is with license that doesn't allow reuse or something

tyingqOP4y ago

GitHub mentions that they don't currently look at the license before trawling code.

https://twitter.com/NoraDotCodes/status/1412741339771461635

There's also other references that GitHub public repos weren't the only source. They trawl other publicly readable code.

nextlevelwizard4y ago

You can sue them for using your code if they break the licensing agreement. Contact EFF and they'll set you up with a lawyer.

j / k navigate · click thread line to collapse