undefined | Better HN

0 pointsprotocolture3mo ago0 comments

>IP laws can stay the same, but they should have purchased a license to use my art before including it in their training data.

But including your art in the training data is fair use (or otherwise exempt) by most standards, as no reproduction occurs. You are advocating for a change to IP law to make it more restrictive.

0 comments

15 comments · 5 top-level

abustamam3mo ago· 5 in thread

Fair use by most standards? Which standards are those? I don't think a standard about training an AI on billions of images exists.

oreally3mo ago

By the same 'transformative' standards that allow satire, reaction and commentary videos to exist. And those take 100% from the source and add context, whereas good generated AI images that aren't wholesale copying take like less than 10% from the original source.

In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.

maplethorpe3mo ago

> good generated AI images that aren't wholesale copying take like less than 10% from the original source.

So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.

I agree that's absurd. But training a model is no more "observing images" than an F1 car is "walking" down a race track. Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human. That comparison you're making is the real absurdity.

1 more reply

tovej3mo ago

Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

2 more replies

protocoltureOP3mo ago

Google scrapes the entire internet to generate a searchable index of the internet. But the resulting search engine is only infringing where it reproduces entire copies of scraped news articles and images. Both places where they have been put back in their place through legal means.

Like LLM's, it retains the produced index but not the original data.

The big concern is whether producing an LLM is competing with artists directly, but as artists dont make LLMs, this seems to be consistently ruled as non competing.

abustamam3mo ago

I don't quite follow. People don't go on Google and search for midieval history and pretend they wrote the Wikipedia article on it because they found it on Google.

People _do_ use LLMs to make art in someone else's style (knowingly or unknowingly) and claim it as their own creation.

Also, I wouldn't say the creators of LLMs are competing with artists. The users of LLMs are. Arists don't make LLMs, they make art, and people who use midjourney and such make art.

But I'd argue that creators of LLMs are still liable for the harm people cause using their tools. Perhaps not legally, but certainly ethically.

JoshTriplett3mo ago· 4 in thread

> But including your art in the training data is fair use

The four factors of fair use in the US:

> the purpose and character of your use

Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.

> the nature of the copyrighted work

Absolutely everything. Artistic, creative, not purely factual.

> the amount and substantiality of the portion taken, and

All of it, from everyone.

> the effect of the use upon the potential market.

Directly competing with those whose data was copied.

rcxdude3mo ago

3 and 4 are what that argument is based on, I believe. 3) on the basis that the output is not _reproduced_, and 4) on similar grounds that output that's just not at all the same as the input data isn't affecting the market for the original image (I think this is the more debatable one, but in general the existing cases have struggled at the early stages because the plaintiffs have not been able to actually point to output that is a copy of their part of the input, and this does actually matter).

protocoltureOP3mo ago

>Directly competing with those whose data was copied.

An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

>All of it, from everyone.

With the result that anything produced by the LLM does not reproduce any single source in its entirety (and where compelled if they are able to do that is a bug not a feature)

Fair use is too specific tbh, rather than ruling it fair use (which seems to be where things are going) it should just be ruled "use". There's nothing wrong with building a mathematical model using available data.

JoshTriplett3mo ago

> An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

Yes, it does. Many people are using AI-generated works in places where they originally would have either paid an artist, programmer, or other creative professional, or done without. Many companies are claiming to reduce staff because of AI (whether that's true or an excuse). There is plenty of evidence that AI is directly competing with various individuals, businesses, and industries.

> With the result that anything produced by the LLM does not reproduce any single source in its entirety

You do not have to reproduce sources in their entirety to produce derivative works.

oreally3mo ago

> the amount and substantiality of the portion taken, and

> All of it, from everyone.

Yea I'd like to see how drawing two circles violates the copyright of drawing one circle!

bluefirebrand3mo ago· 1 in thread

> But including your art in the training data is fair use

It shouldn't be!

protocoltureOP3mo ago

Why? I dont undertand this take at all.

heavyset_go3mo ago

No precedent has been set when it comes to training and fair use

throwawaysoxjje3mo ago

Which case decided that?

j / k navigate · click thread line to collapse

0 comments

15 comments · 5 top-level

abustamam3mo ago· 5 in thread

Fair use by most standards? Which standards are those? I don't think a standard about training an AI on billions of images exists.

oreally3mo ago

In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.

maplethorpe3mo ago

> good generated AI images that aren't wholesale copying take like less than 10% from the original source.

So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.

1 more reply

tovej3mo ago

Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

2 more replies

protocoltureOP3mo ago

Like LLM's, it retains the produced index but not the original data.

The big concern is whether producing an LLM is competing with artists directly, but as artists dont make LLMs, this seems to be consistently ruled as non competing.

abustamam3mo ago

I don't quite follow. People don't go on Google and search for midieval history and pretend they wrote the Wikipedia article on it because they found it on Google.

People _do_ use LLMs to make art in someone else's style (knowingly or unknowingly) and claim it as their own creation.

Also, I wouldn't say the creators of LLMs are competing with artists. The users of LLMs are. Arists don't make LLMs, they make art, and people who use midjourney and such make art.

But I'd argue that creators of LLMs are still liable for the harm people cause using their tools. Perhaps not legally, but certainly ethically.

JoshTriplett3mo ago· 4 in thread

> But including your art in the training data is fair use

The four factors of fair use in the US:

> the purpose and character of your use

Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.

> the nature of the copyrighted work

Absolutely everything. Artistic, creative, not purely factual.

> the amount and substantiality of the portion taken, and

All of it, from everyone.

> the effect of the use upon the potential market.

Directly competing with those whose data was copied.

rcxdude3mo ago

protocoltureOP3mo ago

>Directly competing with those whose data was copied.

An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

>All of it, from everyone.

With the result that anything produced by the LLM does not reproduce any single source in its entirety (and where compelled if they are able to do that is a bug not a feature)

JoshTriplett3mo ago

> An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

> With the result that anything produced by the LLM does not reproduce any single source in its entirety

You do not have to reproduce sources in their entirety to produce derivative works.

oreally3mo ago

> the amount and substantiality of the portion taken, and

> All of it, from everyone.

Yea I'd like to see how drawing two circles violates the copyright of drawing one circle!

bluefirebrand3mo ago· 1 in thread

> But including your art in the training data is fair use

It shouldn't be!

protocoltureOP3mo ago

Why? I dont undertand this take at all.

heavyset_go3mo ago

No precedent has been set when it comes to training and fair use

throwawaysoxjje3mo ago

Which case decided that?

j / k navigate · click thread line to collapse