undefined | Better HN

0 pointssimonw3y ago0 comments

The trained on stolen artwork critique is reasonable - I helped with one of the first big investigations into how that training data worked when Stable Diffusion first came out: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weekno...

It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.

I'm under the impression that DALL-E itself used licensed data as well.

I find some people are comfortable with that, but others will switch to different concerns - which indicates to me that they're actually more offended by the idea of AI-generated art than the specific implementation details of how it was trained.

0 comments

adamm2553y ago

When I did Photography at college, a lot of the work was looking at other works of art. I spent a lot of time in Google Images, diving through books from the Art section and going to galleries. Lots of photo copying was involved!

I then did works in the style of what I’d researched. I trained myself on works I didn’t own, and then produced my own.

I kind of see the AI training as similar work, just done programmatically vs physically.

Certainly a very interesting topic.

I can’t get my head around how far we’ve come on this in the last 6-12 months. From pretty awful outputs to works winning Photography awards. And prints of a dog called Queso you’d have paid a lot of money to an illustrator for.

rgbrgb3y ago

I think it's more analogous to if you had tweaked one of those famous works directly in photoshop then turned it in. The model training likely results in near replicas of some of the training data encoded in the model. You might have a near replica of a famous photograph encoded in your head, but to make a similar photograph you would recreate it with your own tools and it would probably come out pretty different. The AI can just output the same pixels.

That's not to say there aren't other ways you might use the direct image (e.g. collage or sampling in music) but you'll likely be careful with how it's used, how much you tweak it, and with attribution. I think the weird problem we're butting up against is that AFAIK you can't figure out post-facto what the "influence" is from the model output aside from looking at the input (which does commonly use names of artists).

I work on an AI image generator, so I really do think the tech is useful and cool, but I also think it's disingenuous (or more generously misinformed) to compare it to an artist studying great works or taking inspiration from others. These are computers inputting and outputting bits. Another human analog would be memorizing a politician's speech and using chunks of it in your own speech. We'd easily call that plagiarism, but if instead every 3 words were exactly the same? Hard to say... it's both more and less plagiarism.

Just how much do you need to process a sampled work before you need to get permission of the original artist? It seems to be in music that if the copyright holder can prove you sampled them, even if it's unrecognizable, then you're going to be on the hook for some royalties.

simonwOP3y ago

"The model training likely results in near replicas of some of the training data encoded in the model."

I don't think that's true.

My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.

How much of an impact is the thing that's influenced by the prompt.

One way to think about it: the Stable Diffusion model can be as small as 1.9GB (Web Stable Diffusion). It's trained on 2.3 billion images. That works out as 6.6 bits of data per image in the training set.

2 more replies

Jevon233y ago

>It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.

If they truly got an appropriate license agreement for every image in the training set then I have no issues with that.

>I'm under the impression that DALL-E itself used licensed data as well.

DALL-E clearly used images they did not have a license for. Early on it was able to output convincing images of Pikachu and Homer Simpson. OpenAI certainly didn’t get licensing rights for those characters.

Filligree3y ago

There's an argument to be made that drawing Pikachu should not be allowed, certainly. I think it's harder to make the argument that humans should be allowed to, but AI not.

What ongoing litigation I'm aware of seeks to close that loophole and make fanart illegal, which would be a first step towards also preventing AI art.

wtetzner3y ago

In terms of copyright, I don't believe there's any issue with drawing Pikachu, unless it's an exact replica of someone else's drawing.

Not sure if there would be trademark issues. But that would be the case regardless of how the image was created.

1 more reply

bugglebeetle3y ago

I think the more correct argument is that Stable Diffusion effectively did a Napster to force artists into shit licensing deals with large players who can handle the rights management. It’s unlikely that artists would’ve ever agreed to them otherwise, but since the alternative now is to have your work duplicated by a pirate model or legally gray service, what are you going to do? This seems borne out by the fact that Stability AI themselves are now retreating behind Amazon for protection.

madeofpalk3y ago

I think you can both think that Adobe's model is ethical, but also personally just not like the trend or tools.

j / k navigate · click thread line to collapse