undefined | Better HN

0 pointsjosephcsible2y ago0 comments

> What we need is a legal way for companies to keep the data open, but also require OpenAI and friends to pay them for it.

Couldn't that be accomplished by a law or ruling that using something for training AI doesn't exempt you from having to follow its license? OpenAI is already in blatant violation of both the "BY" and "SA" parts of the existing license.

0 comments

juliangoldsmith2y ago

Arguably, a model created by training on a corpus of data is a derived work of that corpus.

Let's say I take a collection of images and use a program to compress them. When decompressed, the images are close to, but not exactly the same as the originals. Despite being in a different format, and despite not being exactly the same as the originals, the copyright to the compressed images is still held by whoever previously held it.

If I take the collection of images from earlier and train a diffusion model based on it, I'm essentially just compressing it a different way. With the right prompt, you can get out something very similar to what you put in.

rileymat22y ago

By this logic, isn't remembering something in your brain also a derived work? But that would not make any sense to protect until you create and distribute something based on that memory. The same logic should be applied to this.

jacksnipe2y ago

If you remember it from your brain and perform it live, that’s perfectly fine. So there’s a line to be drawn somewhere and I don’t think it’s super clear cut in most cases.

bioemerl2y ago

No, according to copyright if you remember something from your brain and perform it live you are very much in violation.

If you remember it and make something that is a distinct work, something that may be steals the idea without reproducing any of its elements, that's never been considered under copyright.

I think that's going to be the litmus test for these AI. If you can get them to produce out both that is this things from anything else, it's not going to be a copyright violation because it's not a copy of anything.

bioemerl2y ago

> If I take the collection of images from earlier and train a diffusion model based on it, I'm essentially just compressing it a different way

Not really. If diffusion models were compression they'd be so lossy as to be totally worthless

j / k navigate · click thread line to collapse

0 comments

juliangoldsmith2y ago

Arguably, a model created by training on a corpus of data is a derived work of that corpus.

rileymat22y ago

jacksnipe2y ago

If you remember it from your brain and perform it live, that’s perfectly fine. So there’s a line to be drawn somewhere and I don’t think it’s super clear cut in most cases.

bioemerl2y ago

No, according to copyright if you remember something from your brain and perform it live you are very much in violation.

If you remember it and make something that is a distinct work, something that may be steals the idea without reproducing any of its elements, that's never been considered under copyright.

bioemerl2y ago

> If I take the collection of images from earlier and train a diffusion model based on it, I'm essentially just compressing it a different way

Not really. If diffusion models were compression they'd be so lossy as to be totally worthless

j / k navigate · click thread line to collapse