undefined | Better HN

0 pointsastrange3y ago0 comments

StableDiffusion is not based on art commissions. You can search https://rom1504.github.io/clip-retrieval/ and see what kind of nonsense it usually has trained on.

0 comments

3 comments · 1 top-level

GaryNumanVevo3y ago· 2 in thread

It most definitely is, LAION-5B contains a large amount of copyrighted works from DeviantArt, ArtStation, etc.

astrangeOP3y ago

Are those all commissions?

The aesthetic subset is .05% Artstation:

https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/im...

Not sure if that's a large amount or not. They could've used robots.txt if they didn't want to be indexed.

GaryNumanVevo3y ago

> Note that this is only a small subset of the total training data: about 2% of the 600 million images used to train the most recent three checkpoints, and only 0.5% of the 2.3 billion images that it was first trained on. [1]

That dataset only covers "aesthetic" clip terms as well. Not to mention a lot of images come from Pinterest and other aggregators.

[1] https://waxy.org/2022/08/exploring-12-million-of-the-images-...

j / k navigate · click thread line to collapse