Nightshade: An offensive tool for artists against AI art generators (opens in new tab)

(nightshade.cs.uchicago.edu)

590 pointsink4042y ago678 comments

678 comments

Paper is here: https://arxiv.org/abs/2310.13828

This seems to introduce levels of artifacts that many artists would find unacceptable: https://twitter.com/sini4ka111/status/1748378223291912567

The rumblings I'm hearing are that this a) barely works with last-gen training processes b) does not work at all with more modern training processes (GPT-4V, LLaVA, even BLIP2 labelling [1]) and c) would not be especially challenging to mitigate against even should it become more effective and popular. The Authors' previous work, Glaze, also does not seem to be very effective despite dramatic proclamations to the contrary, so I think this might be a case of overhyping an academically interesting but real-world-impractical result.

[1]: Courtesy of /u/b3sn0w on Reddit: https://imgur.com/cI7RLAq https://imgur.com/eqe3Dyn https://imgur.com/1BMASL4

kmeisthax2y ago

The screenshots you sent in [1] are inference, not training. You need to get a Nightshaded image into the training set of an image generator in order for this to have any effect. When you give an image to GPT-4V, Stable Diffusion img2img, or anything else, you're not training the AI - the model is completely frozen and does not change at all[0].

I don't know if anyone else is still scraping new images into the generators. I've heard somewhere that OpenAI stopped scraping around 2021 because they're worried about training on the output of their own models[1]. Adobe Firefly claims to have been trained on Adobe Stock images, but we don't know if Adobe has any particular cutoffs of their own[2].

If you want an image that screws up inference - i.e. one that GPT-4V or Stable Diffusion will choke on - you want an adversarial image. I don't know if you can adversarially train on a model you don't have weights for, though I've heard you can generalize adversarial training against multiple independent models to really screw shit up[3].

[0] All learning capability of text generators come from the fact that they have a context window; but that only provides a short term memory of 2048 tokens. They have no other memory capability.

[1] The scenario of what happens when you do this is fancifully called Habsburg AI. The model learns from it's own biases, reinforcing them into stronger biases, while forgetting everything else.

[2] It'd be particularly ironic if the only thing Nightshade harms is the one AI generator that tried to be even slightly ethical.

[3] At the extremes, these adversarial images fool humans. Though, the study that did this intentionally only showed the images for a small period of time, the idea being that short exposures are akin to a feed-forward neural network with no recurrent computation pathways. If you look at them longer, it's obvious that it's a picture of one thing edited to look like another.

scheeseman4862y ago

Hey you know what might not be AI generated post-2021? Almost everything run through Nightshade. So given it's defeated, which is pretty likely, artists have effectively tagged their own work for inclusion.

hkt2y ago

It is a great shame that we have come to a no-win situation for artists when VCs are virtually unable to lose.

1 more reply

kmeisthax2y ago

Why wouldn't an artist just generate AI spam and Nightshade it?

visarga2y ago

Modern generative image models are trained on curated data, not raw internet data. Sometimes the captions are regenerated to fit the image better. Only high quality images with high quality descriptions.

1 more reply

KTibow2y ago

Correct me if I'm wrong but I understand image generators as relying on auto-labeled images to understand what means what, and the point of this attack to make the auto-labelers mislabel the image, but as the top-level comment said it's seemingly not tricking newer auto-labelers.

1 more reply

webmaven2y ago

Even if no new images are being scraped to train the foundation text-to-image models, you can be certain that there is a small horde of folk still scraping to create datasets for training fine-tuned models, LoRAs, Textual Inversions, and all the new hotness training methods still being created each day.

GaggiX2y ago

If it doesn't work during inference I really doubt it will have any intended effect during training, there is simply too much signal and the added adversarial noise works on the frozen and small proxy model they used (CLIP image encoder I think) but it doesn't work on a larger model and trained on a different dataset, if there is any effect during training it will probably just be the model learning that it can't take shortcuts (the artifacts working on the proxy model showcase gaps in its visual knowledge).

Generative models like text-to-image have an encoder part (it could be explicit or not) that extract the semantic from the noised image, if the auto-labelers can correctly label the samples then the encoded trained on both actual and adversarial images will learn to not take the same shortcuts that the proxy model has taken making the model more robust, I cannot see an argument where this should be a negative thing for the model.

ptdn2y ago

The context windows of LLMs are now significantly larger than 2048 tokens, and there are clever ways to autopopulate context window to remind it of things.

jerbear43282y ago

[3] sounds really interesting - do you have a link?

1 more reply

brucethemoose22y ago

Yeah. At worst a simple img2img diffusion step would mitigate this, but just eyeballing the examples, traditional denoisers would probably do the job?

Denoising is probably a good preprocessing step anyway.

achileas2y ago

It’s a common preprocessing step and I believe that’s how glaze (this lab’s previous work) was defeated.

pimlottc2y ago

I can’t really see any difference in those images on the Twitter example when viewing it on mobile

vhcr2y ago

The animation when you change images makes it harder to see the difference, I opened the three images each in its own tab and the differences are more apparent when you change between each other instantly.

2 more replies

fenomas2y ago

At full size it's super obvious - I made a side-by-side:

https://i.imgur.com/I6EQ05g.png

1 more reply

josefx2y ago

Something similar to jpeg artifacts on any surface with a normally smooth color gradient, in some cases rather significant.

0xcde4c3db2y ago

I didn't see it immediately either, but there's a ton of added noise. The most noticeable bit for me was near the standing person's bent elbow, but there's a lot more that becomes obvious when flipping back and forth between browser tabs instead of swiping on Twitter.

Keyframe2y ago

look at the green drapes to the right, or any large uniform colored space. It looks similar to bad JPEG artifacts.

pxc2y ago

I don't have great vision, but me neither. They're indistinguishable to me (likewise on mobile).

1 more reply

jquery2y ago

It's really noticeable on desktop, like compressing an 800kb jpeg to 50kb. Maybe on mobile you won't notice, but on desktop the image looks blown out.

milsorgen2y ago

It took me a minute too but on the fast you can see some blocky artifacting by the elbow and a few spots elsewhere like curtain upper left.

charcircuit2y ago

The gradient on the bat has blocks in it instead of being smooth.

gedy2y ago

Maybe it's more about "protecting" images that artists want to publicly share to advertise work, but it's not appropriate for final digital media, etc.

sesm2y ago

In short, anti-AI watermark.

johnnyanmac2y ago

Yeah. It may mess with the artist's vision but the impact is still way more subtle than other methods used to protect against these unwanted actions.

Of course I'm assuming it works to begin with. Sounds like a game of cat and mouse. And AI has a lot of rich cats.

kjs32y ago

Seems obvious that the people stealing would be adjusting their process to negate these kinds of countermeasures all the time. I don't see this as an arms race the artists are going to win. Not like the LLM folks can consider actually paying their way...the business plan pretty much has "...by stealing everything we can get our hands on..." in the executive summary.

h0p32y ago

Sir /u/b3nsn0w is courteous, `/nod`.

GaryNumanVevo2y ago

The artifacts are a non-issue. It's intended images with nightshade are intended to be silently scrapped and avoid human filtering.

minimaxir2y ago

The artifacts are extremely an issue for artists who don't want their images damaged for the possibility of them not being trained by AI.

It's a bad tradeoff.

1 more reply

the84722y ago

do you mean scrapped or scraped?

1 more reply

soulofmischief2y ago

> The artifacts are a non-issue.

According to which authority?

gfodor2y ago

Huge market for snake oil here. There is no way that such tools will ever win, given the requirements the art remain viewable to human perception, so even if you made something that worked (which this sounds like it doesn’t) from first principles it will be worked around immediately.

The only real way for artists or anyone really to try to hold back models from training on human outputs is through the law, ie, leveraging state backed violence to deter the things they don’t want. This too won’t be a perfect solution, if anything it will just put more incentives for people to develop decentralized training networks that “launder” the copyright violations that would allow for prosecutions.

All in all it’s a losing battle at a minimum and a stupid battle at worst. We know these models can be created easily and so they will, eventually, since you can’t prevent a computer from observing images you want humans to be able to observe freely.

AJ0072y ago

The level of claims accompanied by enthusiastic reception from a technically illiterate audience make it sound, smell, and sound like snake oil without much deep investigation.

There is another alternative to the law. Provide your art for private viewing only, and ensure your in person audience does not bring recording devices with them. That may sound absurd, but it's a common practice during activities like having sex.

Gormo2y ago

That doesn't sound like a viable business model. There seems to be a non-trivial bootstrap problem involved -- how do you become well-known enough to attract audiences to private venues in sufficient volume to make a living? -- and would in no way diminish demand for AI-generated artwork which would still continue to draw attention away from you.

wraptile2y ago

The thing is people want the benefits of having their stuff public but not bear the costs. Scraping has been mostly a solved problem especially when it comes to broad crawling. Put it under a login, there, no more AI "stealing" your work.

2 more replies

Art96812y ago

This would just create a new market for art paparazzis who would find any and all means to inflitrate such private viewings with futuristic miniature cameras and other sensors and selling it for a premium. Less than 24 hours later the files end up on hundreds or thousands of centralized and decentralized servers.

I'm not defending it. Just acknowledging the reality. The next TMZ for private art gatherings is percolating in someone's garage at the moment.

3 more replies

gfodor2y ago

True I can imagine that kind of thing becoming popular.

thfuran2y ago

>There is no way that such tools will ever win, given the requirements the art remain viewable to human perception

On the other hand, the adversarial environment might push models towards a representation more aligned with human perception, which is neat.

aqfamnzc2y ago

The ol' Analog Gap. https://en.m.wikipedia.org/wiki/Analog_hole

Reubend2y ago

> Huge market for snake oil here.

This tool is free, and as far as I can tell it runs locally. If you're not selling anything, and there's no profit motive, then I don't think you can reasonably call it "snake oil".

At worst, it's a waste of time. But nobody's being deceived into purchasing it.

autoexec2y ago

If this is a danger from "snake oil" of this type, it'd be from the other side, where artists are intentionally tricked into believing that tools like this mean that AI isn't or won't be a threat to their copyrights in order to get them to stop opposing it so strongly, when in fact the tool does nothing to prevent their copyrights from being violated.

I don't think that's the intention of Nightshade, but I wouldn't put past someone to try it.

Biganon2y ago

There's an academic paper being published.

Snake oil for the sake of getting published is a very real problem that does exist.

golol2y ago

Religion is also deceptive and snake-oil even if it does not involve profit driven motivations.

1 more reply

spaceman_20202y ago

This is the hard reality. There is no putting this genie back in the bottle.

The only way to be an artist now is to have a unique style of your own, and to never make it online.

hutzlibu2y ago

"and to never make it online."

So then of course, you also cannot sell your work, as those might put it online. And you cannot show your art to big crowds, as some will make pictures and put it online. So ... you can become a literal underground artists, where only some may see your work. I think only some will like that.

But I actually disagree, there are plenty of ways to be an artist now - but most should probably think about including AI as a tool, if they still want to make money. But with the exception of some superstars, most artists are famously low on money - and AI did not introduce this. (all the professional artists I know, those who went to art school - do not make their income with their art)

2 more replies

jedberg2y ago

Everything old is new again. It's the same thing with any DRM that happens on the client side. As long as it's viewable by humans, someone will figure out a way to feed that into a machine.

honkycat2y ago

"A law, ie, leveraging state backed violence to deter the things they don’t want."

We all know what a law is you don't need to clarify. It makes your prose less readable.

gfodor2y ago

Other people pointed out they appreciated this prose. It’s easy to forget what exactly people are asking for when they talk about regulating the training of machine learning models.

jMyles2y ago

> leveraging state backed violence to deter the things they don’t want

I just want to say: I really appreciate the stark terms in which you've put this.

The thing that has come to be called "intellectual property" is actually just a threat of violence against people who arrange bytes in a way that challenges power structures.

mihaaly2y ago

I heard that flooding the net with AI generated art would do much much more harm to generative AI than this whatever is this. Yes, this must be some snake oil salesman, those take it seriously turn AIs own weapon against AI.

vmirnv2y ago

I'm thinking — is it possible to create something on a global level similar to what they did in Snapchat: some sort of image flickering that would be difficult to parse, but still acceptable for humans?

nihilius2y ago

Sorry i do not use Snapchat and with googeling "Snapchat image flickering" i did not find a good result. Could you elaborate this a bit more or provide me with a link where this is described? Thank you very much. :)

int_19h2y ago

If humans can process it, you can train a model to do the same.

elzbardico2y ago

You don’t need it to visible. You only need it to be scrapped to poison the models. I think that’s the idea.

AlfeG2y ago

My guess. Is that at some poi t of time You will not be able to use any generated image or video in commercial. Because of 100% copyright claim for using parts of copyrighted image. Like YouTube those days. When some random beeps matches with someone music...

abrarsami2y ago

It should be like that. I agree

minimaxir2y ago

A few months ago I made a proof-of-concept on how finetuning Stable Diffusion XL on known bad/incoherent images can actually allow it to output "better" images if those images are used as a negative prompt, i.e. specifying a high-dimensional area of the latent space that model generation should stay away from: https://news.ycombinator.com/item?id=37211519

There's a nonzero chance that encouraging the creation of a large dataset of known tampered data can ironically improve generative AI art models by allowing the model to recognize tampered data and allow the training process to work around it.

smrtinsert2y ago

Great lora post, thanks for sharing this again! Not sure how I missed as I'm especially interested in sd content.

eigenvalue2y ago

This seems like a pretty pointless "arms race" or "cat and mouse game". People who want to train generative image models and who don't care about what artists think about it at all can just do some basic post-processing on the images that is just enough to destroy the very carefully tuned changes this Nightshade algorithm makes. Something like resampling it to slightly lower resolution and then using another super-resolution model on it to upsample it again would probably be able to destroy these subtle tweaks without making a big difference to a human observer.

In the future, my guess is that courts will generally be on the side of artists because of societal pressures, and artists will be able to challenge any image they find and have it sent to yet another ML model that can quickly adjudicate whether the generated image is "too similar" to the artist's style (which would also need to be dissimilar enough from everyone else's style to give a reasonable legal claim in the first place).

Or maybe artists will just give up on trying to monetize the images themselves and focus only on creating physical artifacts, similar to how independent musicians make most of their money nowadays from touring and selling merchandise at shows (plus Patreon). Who knows? It's hard to predict the future when there are such huge fundamental changes that happen so quickly!

johnnyanmac2y ago

>Or maybe artists will just give up on trying to monetize the images themselves and focus only on creating physical artifacts, similar to how independent musicians make most of their money nowadays from touring and selling merchandise at shows (plus Patreon).

As is, art already isn't a sustainable career for most people who can't get a job in industry. The most common monetization is either commissions or hiding extra content behind a pay wall.

To be honest I can see more proverbial "Furry artists" sprouting up in a cynical timeline. I imagine like every other big tech that the 18+ side of this will be clamped down hard by the various powers that be. Which means NSFW stuff will be shielded a bit by the advancement and you either need to find underground training models or go back to an artist. .

Gigachad2y ago

>need to find underground training models

It's not particularly that hard. The furry nsfw models are already the most well developed and available models you can get right now. And they are spitting out stuff that is almost indistinguishable from regular art.

raincole2y ago

> This seems like a pretty pointless "arms race" or "cat and mouse game".

If there is any "point" of this, it's that's going to push the AI models to become better at capturing how humans see things.

jMyles2y ago

> musicians make most of their money nowadays from touring and selling merchandise at shows

Be reminded that this is - and has always been - the mainstream model of the lineages of what have come to be called "traditional" and "Americana" and "Appalachian" music.

The Grateful Dead implemented this model with great finesse, sometimes going out of their way to eschew intellectual property claims over their work, in the belief that such claims only hindered their success (and of course, they eventually formalized this advocacy and named it "The Electronic Frontier Foundation" - it's no coincidence that EFF sprung from deadhead culture).

mihaaly2y ago

It is a funny appearance (weird viewpoint) that artists are furious loosing their monopily in stealing and cloning components from other artists, recomposing into a similar but new thing.

And that OpenArt on the analogy of OpenSource is a non-existing thing (I know, I know, different things, source code is not for the generic audience and can be hidden on will, unlike art, just having some generative thoughts artefact here ;) )

hackernewds2y ago

the point is you could circumvent one nightshade, but as long as the cat and mouse game continues there can be more

marcinzm2y ago

This feels like it'll actually help make AI models better versus worse once they train on these images. Artists are basically, for free, creating training data that conveys what types of noise does not change the intended meaning of the image to the artist themselves.

r3trohack3r2y ago

The number of people who are going to be able to produce high fidelity art with off the shelf tools in the near future is unbelievable.

It’s pretty exciting.

Being able to find a mix of styles you like and apply them to new subjects to make your own unique, personalized, artwork sounds like a wickedly cool power to give to billions of people.

kredd2y ago

In terms of art, population tends to put value not on the result, but origin and process. People will just look down on any art that’s AI generated in a couple of years when it becomes ubiquitous.

petesergeant2y ago

> population tends to put value not on the result, but origin and process

I think population tends to value "looks pretty", and it's other artists, connoisseurs, and art critics who value origin and process. Exit Through the Gift Shop sums this up nicely

Aerroon2y ago

I disagree. I definitely value modern digital art more than most historical art, because it just looks better. If AI art looks better (and in some cases it does) then I'll prefer that.

1 more reply

redwall_hp2y ago

This is already the case. Art is a process, a form of human expression, not an end result.

I'm sure OpenAI's models can shit out an approximation of a new Terry Pratchett or Douglas Adams novel, but nobody with any level of literary appreciation would give a damn unless fraud was committed to trick readers into buying it. It's not the author's work, and there's no human message behind it.

3 more replies

Theodores2y ago

https://en.wikipedia.org/wiki/Labor_theory_of_value

According to Marx, value is only created with human labour. This is not just a Marxist theory, it is an observation.

There may be lots of over-priced junk that makes you want to question this idea. But let's not nit-pick on that.

In two years time people will not see any value in AI art, quite correctly because there is not much human labour in creating it.

5 more replies

MacsHeadroom2y ago

Nope, but I already look down on artists who refuse to integrate generative AI into their processes.

3 more replies

falcolas2y ago

> Being able to find a mix of styles you like and apply them to new subjects to make your own unique, personalized, artwork sounds like a wickedly cool power to give to billions of people.

And in the process, they will obviate the need for Nightshade and similar tools.

AI models ingesting AI generated content does the work of destroying the models all by itself. Have a look at "Model Collapse" in relation to generative AI.

23B12y ago

It'll be about as wickedly tool as the ability to get on the internet, e.g. commoditized, transactional, and boring.

sebzim45002y ago

I know this is an unpopular thing to say these days, but I still think the internet is amazing.

I have more access to information now than the most powerful people in the world did 40 years ago. I can learn about quantum field theory, about which pop star is allegedly fucking which other pop star, etc.

If I don't care about the law I can read any of 25 million books or 100 million scientific papers all available on Anna's Archive for free in seconds.

2 more replies

__loam2y ago

And we only had to alienate millions of people from their labor to do it.

r3trohack3r2y ago

Absolutely agree we should allow people to accumulate equity through effective allocation of their labor.

And I also agree that we shouldn’t build systems that alienate people from that accumulated equity.

DennisAleynikov2y ago

Yeah, sadly those millions of people don’t matter in the grand scheme of things and were never going to profit off their work long term

2 more replies

mensetmanusman2y ago

Is this utilitarianism?

BeFlatXIII2y ago

Worth it.

password543212y ago

Not really. There is a reason why we find realistic painting to be more fascinating than a photo and why some still practice it. The effort put in by another artist does affect our enjoyment.

wruza2y ago

For me it doesn’t. I’m generating images, realistic, 2.5d, 2d and I like them as much. I don’t feel (or miss) what you described. Or what any other arts guy describes, for that matter. Arts people are different, because they were trained to feel something a normal person wouldn’t. And that’s okay, a normal person without training wouldn’t see how much beauty and effort there is in an algorithm or a legal contract as well.

dartharva2y ago

The word "we" is doing a lot of heavy lifting here. A large majority of consumers can't even tell apart AI-generated from handmade, let alone care who or what made the thing.

1 more reply

chris-orgmenta2y ago

I want progressive fees on copyright/IP/patent usage, and worldwide gov cooperation/legislation (and perhaps even worldwide ability to use works without obtaining initial permission, although let's not go into that outlandish stuff)

I want a scaling license fee to apply (e.g. % pegged to revenue. This still has an indirect problem with different industries having different profit margins, but still seems the fairest).

And I want the world (or EU, then others to follow suit) to slowly reduce copyright to 0 years* after artists death if owned by a person, and 20-30 years max if owned by a corporation.

And I want the penalties for not declaring usage** / not paying fees, to be incredibly high for corporations... 50% gross (harder) / net (easier) profit margin for the year? Something that isn't a slap on the wrist and can't be wriggled out of quite so easily, and is actually an incentive not to steal in the first place.)

[*]or whatever society deems appropriate.

[**]Until auto-detection (for better or worse) gets good enough.

IMO that would allow personal use, encourages new entrants to market, encourages innovation, incentivises better behaviour from OpenAI et al.

Dylan168072y ago

> And I want the world (or EU, then others to follow suit) to slowly reduce copyright to 0 years* after artists death if owned by a person, and 20-30 years max if owned by a corporation.

Why death at all?

It's icky to trigger soon after death, it's bad to have copyright vary so much based on author age, and it's bad for many works to still have huge copyright lengths.

It's perfectly fine to let copyright expire during the author's life. 20-30 years for everything.

wraptile2y ago

Extremely naive to think that any of this could be enforced to any adequate level. Copyright is fundamentally broken and putting some plasters on it is not going to do much especially when these plasters are several decades too late.

1 more reply

alentred2y ago

With this "solution" it looks like the world of art enters the cat-and-mouse game the ad blockers were playing for the last decade or two.

isodev2y ago

I just tested it with Azure AI image classification and it worked - so this cat is yet to adapt to the mouse’s latest idea.

I still feel it is absolutely wrong to roam around the internet and scrape images (without consent) in order to power one’s cash cow AI. I hope more methods to protect artworks (including audio and other formats) become more accessible.

HKH22y ago

Artists copy from each other all the time. Arguably, culture exists because of copying (folk stories by necessity); copyright makes culture top-down and stagnant, and you can't avoid it because they have the money to shove it right in your face. Who wants trickle-down culture?

1 more reply

KTibow2y ago

I might be missing something because I don't know much about the architecture of either Nightshade or AI art generators, but I wonder if you could try to have a GAN-like architecture (an extra model trying to trick the model) for the part of the generator that labels images to build resistance to Nightshade-like filters.

the84722y ago

It doesn't even have to be a full GAN, you only need to train the discriminator side to filter out the data. Clean reference images + Nightshade would be the generator side.

brucethemoose22y ago

What the article doesn't illustrate is that it destroys fine detail in the image, even in the thumbnails of the reference paper: https://arxiv.org/pdf/2310.13828.pdf

Also... Maybe I am naive, but it seems rather trivial to work around with a quick prefilter? I don't know if tradition denoising would be enough, but worst case you could run img2img diffusion.

GaryNumanVevo2y ago

The poisoned images aren't intended to be viewed, rather scraped and pass a basic human screen. You wouldn't be able to denoise as you'd have to denoise the entire dataset, the entire point is that these are virtually undetectable from typical training set examples, but they can push prompt frequencies around at will with a small number of poisoned examples.

minimaxir2y ago

> You wouldn't be able to denoise as you'd have to denoise the entire dataset

Doing that requires much less compute than training a large generative image model.

2 more replies

jamesu2y ago

Long-term I think the real problem for artists will be corporations generating their own high quality targeted datasets from a cheap labor pool, completely outcompeting them by a landslide.

jdietrich2y ago

In the short-to-medium term, we're seeing huge improvements in the data efficiency of generative models. We haven't really started to see self-training in diffusion models, which could improve data efficiency by orders of magnitude. Current models are good at generalisation and are getting better at an incredible pace, so any efforts to limit the progress of AI by restricting access to training data is a speedbump rather than a roadblock.

ufocia2y ago

It will democratize art.

sussmannbaka2y ago

Art is already democratized. It has been for decades. Everyone can pick it up at zero cost. Even you!

The poorest people have historically produced great art. Training a model, however? Expensive. Running it locally? Expensive. Paying the sub? Expensive.

Nothing is being democratized, the only thing this does is devaluing the blood and sweat people have put into their work so FAANG can sell it to lazy suckers.

1 more reply

23B12y ago

then it won't be art anymore, it'll just be mountains of shit

sorta like what the laptop did for writing

1 more reply

Quanttek2y ago

This is fantastic. If companies want to create AI models, they should license the content they use for the training data. As long as there are not sufficient legal protections and the EU/Congress do not act, tools like these can serve as a stopgap and maybe help increase pressure on policymakers

popohauer2y ago

It's going to be interesting to see how the lawsuits against OpenAI by content creators plays out. If the courts rule that AI generated content is a derivative work of all the content it was trained on it could really flip the entire gen AI movement on its head.

luma2y ago

If it were a derivative work[1] (and sufficiently transformational) then it's allowed under current copyright law and might not be the slam dunk ruling you were hoping for.

[1] https://en.wikipedia.org/wiki/Derivative_work

3 more replies

torginus2y ago

My biggest fear is that the big players will drop a few billion dollars to silence the copyright holders with power go away, and new rules are put in place that will make open-source models that can't do the same essentially illegal.

int_19h2y ago

If the courts do rule that way, I would expect a legislative race between different countries to amend the relevant laws. Visual generative AI is just too lucrative a thing.

BeFlatXIII2y ago

…then I'll keep enjoying my Stable Diffusion and pirated models.

Kuinox2y ago

> they should license the content they use for the training data

You mean like OpenAI and Adobe ?

Only the free and open source models didn't licensed any content for the training data.

galleywest2002y ago

Adobe is training off of images stored in their cloud systems, per their Terms of Service.

OpenAI has provided no such documentation or legal guarantees, and it is still quite possible they scraped all sorts of copyright materials.

4 more replies

KeplerBoy2y ago

There is a small difference between any and all. OpenAI certainly didn't licence all of the image they use for training.

jazzyjackson2y ago

source for OpenAI paying anyone a dime? don't you think that would set a precedent that everyone else deserves their cut?

eddd-ddde2y ago

Isn't this just teaching the models how to better understand pictures as humans do? As long as you feed them content that looks good to a human, wouldn't they improve in creating such content?

lern_too_spel2y ago

You would think the economists at UChicago would have told these researchers that their tool would achieve the opposite effect of what they intended, but here we are.

In this case, the mechanism for how it would work is effectively useless. It doesn't affect OpenAI or other companies building foundation models. It only works on people fine-tuning these foundation models, and only if the image is glazed to affect the same foundation model.

GaggiX2y ago

These methods like Glaze usually works by taking the original image chaging the style or content and then apply LPIPS loss on an image encoder, the hope is that if they can deceive a CLIP image encoder it would confuse also other models with different architecture, size and dataset, while changing the original image as little as possible so it's not too noticeable to a human eye. To be honest I don't think it's a very robust technique, with this one they claim that a model instead of seeing for example a cow on grass the model will see a handbag, if someone has access to GPT4-V I want to see if it's able to deceive actually big image encoders (usually more aligned to the human vision).

EDIT: I have seen a few examples with GPT-4 V and how I imagine it wasn't deceived, I doubt this technique can have any impact on the quality of the models, the only impact that this could potentially have honestly is to make the training more robust.

garg2y ago

Each time there is an update to training algorithms and in response poisoning algorithms, artists will have to re-glaze, re-mist, and re-nightshade all their images?

Eventually I assume the poisoning artifacts introduced in the images will be very visible to humans as well.

msp262y ago

>Like Glaze, Nightshade is computed as a multi-objective optimization that minimizes visible changes to the original image.

It's still noticeably visible.

kevingadd2y ago

Yeah, I've seen multiple artists complain about how glazing reduces image quality. It's very noticeable. That seems like an unavoidable problem given how AI is trained on images right now.

popohauer2y ago

I'm glad to see tools like Nightshade starting to pop up to protect the real life creativity of artists. I like AI art, but I do feel conflicted about its potential long term effects towards a society that no longer values authentic creativity.

Minor49er2y ago

Is the existence of the AI tool not itself a product of authentic creativity? Does eliminating barriers to image generation not facilitate authentic creativity?

23B12y ago

No, it facilitates commoditization. Art – real art – is fundamentally a human-to-human transaction. Once everyone can fire perfectly-rendered perfectly-unique pieces of 'art' at each other, it'll just become like the internet is today: filled with extremely low-value noise.

Enjoy the short term novelty while you can.

fulladder2y ago

This is the right prediction. Once machines can generate visual art, people will simply stop valuing it. We may see increased interest in other forms of art, e.g., live performance art like theater. It's hard to predict exactly how it'll play out, but once something becomes cheap to produce and widely available, it loses its luster for connoisseurs and then gradually loses its luster for everybody else too.

int_19h2y ago

So then you'll have curation to find the gems in that noise.

But it's still not clear why this is worse than the situation where not everyone can create perfectly-rendered pieces of whatever idea is in their head, and have to rely on others to do it for them, while being limited by what they can afford and what those others are willing to paint.

BeFlatXIII2y ago

> Art – real art – is fundamentally a human-to-human transaction.

Why is this hippie nonsense so popular?

1 more reply

peter_d_sherman2y ago

To protect an individual's image property rights from image generating AI's -- wouldn't it be simpler for the IETF (or other standards-producing group) to simply create an

AI image exclusion standard

, similar to "robots.txt" -- which would tell an AI data-gathering web crawler that a given image or set of images -- was off-limits for use as data?

https://en.wikipedia.org/wiki/Robots.txt

https://www.ietf.org/

potatolicious2y ago

Entities training models have no incentive to follow such metadata. If we accept the premise that "more input -> better models" then there's every reason to ignore non-legally-binding metadata requests.

Robots.txt survived because the use of it to gatekeep valuable goodies was never widespread. Most sites want to be indexed, most URLs excluded by the robots file are not of interest to the search engine anyway, and use of robots to prevent crawling actually interesting pages is marginal.

If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.

peter_d_sherman2y ago

>Entities training models have no incentive to follow such metadata. If we accept the premise that "more input -> better models" then there's every reason to ignore non-legally-binding metadata requests.

Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Robots.txt survived because the use of it to gatekeep valuable goodies was never widespread. Most sites want to be indexed, most URLs excluded by the robots file are not of interest to the search engine anyway, and use of robots to prevent crawling actually interesting pages is marginal.

Robots.txt survived because it was a "digital signpost" a "digital sign" -- sort of like the way you might put a "Private Property -- No Trespassing" sign in your yard.

Most moral/ethical/lawful people -- will obey that sign.

Some might not.

But the some that might not -- probably constitute about a 0.000001% minority of the population, whereas the majority that do -- probably constitute about 99.99999% of the population.

"Robots.txt" is a sign -- much like a road sign is.

People can obey them -- or they can ignore them -- but they can ignore them only at their own peril!

It's a sign which provides a hint for what the right thing to do in a certain set of circumstances -- which is what the Law is; which is what the majority of Laws are.

People can obey them -- or they can choose to ignore them -- but only at their own peril!

Most will choose to obey them. Most will choose to "take the hint", proverbially speaking!

A few might not -- but that doesn't mean the majority won't!

>If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.

Again, name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

xg152y ago

And then what? The scrapers themselves already happily ignore copyright, they won't be inclined to obey a no-ai.txt. So someone would have to enforce the standard. Currently I see no organisation who would be willing to do this or even just technologically able - as even just detecting such scrapers is an extremely hard task.

Nevertheless, I hope that at some not-so-far point in the future there will be more legal guidance about this kind of stuff, i.e. it will be made clear that scraping violates copyright. This still won't solve the problem of detectability but it would at least increase the risk of scrapers, should they be caught.

peter_d_sherman2y ago

>The scrapers themselves already happily ignore copyright, they won't be inclined to obey a no-ai.txt.

Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Currently I see no organisation who would be willing to do this or even just technologically able - as even just detecting such scrapers is an extremely hard task.

// Part of Image Web Scraper For AI Image Generator ingestion psuedocode:

if fileExists("no-ai.txt") {

  // Abort image scraping for this site -- move on to the next site

} else {

  // Continue image scraping for this site

};

See? Nice and simple!

Also -- let me ask you this -- what happens to the intellectual property (or just plain property) rights of Images on the web after the author dies? Or say, 50 years (or whatever the legal copyright timeout is) after the author dies?

Legal grey area perhaps?

Also -- what about Images that exist in other legal jurisdictions -- i.e., other countries?

How do we know what set of laws are to apply to a given image?

Point is: If you're going to endorse and/or construct a legal framework (and have it be binding -- keep in mind you're going to have to traverse the legal jurisdictions of many countries, many countries!) -- you might as well consider such issues.

Also -- at least in the United States, we have Juries that can override any Law (Separation of Powers) -- that is, that which is considered "legally binding" -- may not be quite so "legally binding" if/when properly explained to a proper jury in light of extenuating (or just plain other) circumstances!

So kindly think of these issues prior to making all-encompasing proposals as to what you think should be "legally binding" or not.

I comprehend that you are just trying to solve a problem; I comprehend and empathize; but the problem might be a bit greater than you think, and there might be one if not serveral unexplored partial/better (since no one solution, legal or otherwise, will be all-encompassing) solutions -- because the problem is so large in scope -- but all of these issues must be considered in parallel -- or errors, present or future will occur...

1 more reply

ang_cire2y ago

Setting aside the efficacy of this tool, I would be very interested in the legal implications of putting designs in your art that could corrupt ML models.

For instance, if I set traps in my home which hurt an intruder we are both guilty of crimes (traps are illegal and are never considered self defense, B&E is illegal).

Would I be responsible for corrupting the AI operator's data if I intentionally include adversarial artifacts to corrupt models, or is that just DRM to legally protect my art from infringement?

edit:

I replied to someone else, but this is probably good context:

DRM is legally allowed to disable or even corrupt the software or media that it is protecting, if it detects misuse.

If an adversarial-AI tool attacks the model, it then becomes a question of whether the model, having now incorporated my protected art, is now "mine" to disable/corrupt, or whether it is in fact out of bounds of DRM.

So for instance, a court could say that the adversarial-AI methods could only actively prevent the training software from incorporating the protected media into a model, but could not corrupt the model itself.

anigbrowl2y ago

None whatsoever. There is no right to good data for model training, nor does any contractual relationship exist between you and and a model builder who scrapes your website.

ang_cire2y ago

If you're assuming this is open-shut, you're wrong. I asked this specifically as someone who works in security. A court is going to have to decide where the line is between DRM and malware in adversarial-AI tools.

anigbrowl2y ago

I'm not. Malware is one thin, passive data poisoning is another. Mapmakers have long used such devices to detect/deter unwanted copying. In the US such 'trap streets' are not protected by copyright, but nor do they generate liability.

https://en.wikipedia.org/wiki/Trap_street

1 more reply

ufocia2y ago

Worth trying but I doubt it unless we establish a right to train.

danShumway2y ago

The way Nightshade works (assuming it does work) is by confusing the features of different tags with each other. To argue that this is illegal would be to argue that mistagging a piece of artwork on a gallery is illegal.

If you upload a picture of a dog to DeviantArt and you label it as a cat, and a model ingests that image and starts to think that cats look like dogs, would anybody claim that you are breaking a law? If you upload bad code to Github that has bugs, and an AI model consumes that code and then reproduces the bugs, would anyone argue that uploading badly written code to Github is a crime?

What if you uploaded some bad code to Github and then wrote a comment at the top of the code explaining what the error was, because you knew that the model would ignore that comment and would still look at the bad code. Then would you be committing a crime by putting that code on Github?

Even if it could be proven that your intention was for that code or that mistagged image to be unhelpful to training, it would still be a huge leap to say that either of those activities were criminal -- I would hope that the majority of HN would see that as a dangerous legal road to travel down.

kortilla2y ago

That’s like asking if lying on a forum is illegal

ang_cire2y ago

No, it's much closer to (in fact, it is simply) asking if adversarial AI tools count as DRM or as malware. And a court is going to have to decide whether the model and or its output counts as separate software, which it is illegal for DRM to intentionally attack.

DRM can, for instance, disable its own parent tool (e.g. a video game) if it detects misuse, but it can't attack the host computer or other software on that computer.

So is the model or its output, having been trained on my art, a byproduct of my art, in which case I have a legal right to 'disable' it, or is it separate software that I don't have a right to corrupt?

1 more reply

npteljes2y ago

I see it as no different than mapmakers inventing a nonexistent alley, to check who copies their maps verbatim ("trap street"). Even if this caused, for example, a car crash because of an autonomous driver, the onus I think would be on the one that made the car and used the stolen map for navigation, and not on the one that created the original map.

https://en.wikipedia.org/wiki/Trap_street

CaptainFever2y ago

Japan is considering it, I think? https://news.ycombinator.com/item?id=38615280

GaryNumanVevo2y ago

How would that situation be remotely related?

fennecfoxy2y ago

I find the AI training topic interesting, because it's really data/information that is involved. Forget about the fact that it's images or stories or Reddit posts, it's all data.

We are born and then exposed to the torrent of data from the world around us, mostly fed to us by other humans, this is what models are trying to tap.

Unfortunately our learning process is completely organic and takes decades and decades and decades; there's no way to put a model through this easily.

Perhaps we need to seed the web with AI agents who converse and learn as much like regular human beings as possible and assemble the dataset that way. Although having an agent browse and find an image to learn to draw from is still gonna make people reee even if that's exactly what a young and aspiring human artist would be doing.

Don't talk about humans being sacred; we already voted to let corporations be people, for the 1% to exist and "lobby", breaking our democracy so that they can get tax breaks and make corrupt under the table deals. None of us stopped that from happening...

zirgs2y ago

Does it survive AI upscaling or img2img? If not - then it's useless. Nobody trains AI models without any preprocessing. This is basically a tool for 2022.

gweinberg2y ago

For this to work, wouldn't you have to have an enormous number of artists collaborating on "poisoning" their images the same way (cow to handbag) while somehow keeping it secret form ai trainers that they were doing this? It seems to me that even if the technology works perfectly as intended, you're effectively just mislabeling a tiny fraction of the training data.

Shemetz2y ago

1. They don't need an enormous number of artists; the research paper showed significant results with even 50 poisoned image samples in the dataset, which is enough to be contained in even a single artist's online gallery.

2. They don't need to keep it a secret; the goal is to remove these images from the training data, in a way that would be much more efficient than simply adding a "please don't include my art in your ai scraper" message next to your pictures.

ThinkBeat2y ago

In so far as anger goes against AIs being trained on particular intellectual properties.

A made up scenario¹ is that a person who is training an AI, goes to the local library and checks out 600 books on art. The person then lets the AI read all of them. After which they are returned to the library and another 600 books are borrowed

Then we can imagine the AI somehow visiting a lot of museums and galleries.

The AI will now have been trained on the style and looks of a lot of art from different artists

All the material has been obtained in a legal manner.

Is this an acceptable use?

Or can an artist still assert that the AI was trained with their IP without consent?

Clearly this is one of the ways a human would go about learning about styles, techniques etc..

¹ Yes you probably cannot borrow 600 books at a time. How does the AI read the books? I dont know. Simplicity would be that the researcher takes a photo of each page. This would be extremmly slow but for this hypothetical it is acceptable.

nanofus2y ago

I think the key difference here is that the most prominent image generation AIs are commercial and for-profit. The scenarios you describe are comparing a commercial AI to a private person. You cannot get a library card for a company, and you cannot bring a photography crew to a gallery without permission.

enord2y ago

I’m completely flabbergasted by the number of comments implying copyright concepts such as “fair use” or “derivative work” apply to trained ML models. Copyright is for _people_, as are the entailing rights, responsibilities and exemptions. This has gone far beyond anthropomorphising and we need to like get it together, man!

ronsor2y ago

You act like computers and ML models aren't just tools used by people.

enord2y ago

What did I write to give you that impression?

1 more reply

CaptainFever2y ago

No one is saying a model is the legal entity. The legal entities are still people and corporations.

enord2y ago

Oh come on, you’re being insincere. Wether or not the model is learning from the work just like people is hotly debated as if it would make a difference. Fair use is even brought up. Fair use! Even if it applied, these training sets collate all of everything

I feel like I’m taking crazy pills TBQH

squidbeak2y ago

I really don't understand the anxiety of artists towards AI - as if creatives haven't always borrowed and imitated. Every leading artist has had acolytes, and while it's true no artist ever had an acolyte as prodigiously productive as AI will be, I don't see anything different between a young artist looking to Picasso for cues and Stable Diffusion or DALL-E doing the same. Styles and methods haven't ever been subject to copyright - and art would die the moment that changed.

The only explanation I can find for this backlash is that artists are actually worried just like the rest of us that pretty soon AI will produce higher quality more inventive work faster and more imaginatively than they can - which is very natural, but not a reason to inhibit an AI's creative education.

beepbooptheory2y ago

This has been litigated over and over again, and there have been plenty of good points made and concerns raised over it by those who it actually affects. It seems a little bit disingenuous (especially in this forum) to say that that conclusion is the "only explanation" you can come up with. And just to avoid prompting you too much: trust me, we all know or can guess why you think AI art is a good thing regardless of any concerns one might bring up.

jwells892y ago

Imitation isn’t the problem so much as it is that ML generated images are composed of a mush of the images it was trained on. A human artist can abstract the concepts underpinning a style and mimic it by drawing all-new lineart, coloration, shading, composition, etc, while the ML model has to lean on blending training imagery together.

Furthermore there’s a sort of unavoidable “jitter” in human-produced art that varies between individuals that stems from vastly different ways of thinking, perception of the world, mental abstraction processes, life experiences, etc. This is why artists who start out imitating other artists almost always develop their imitations into a style all their own — the imitations were already appreciably different from the original due to the aforementioned biases and those distinctions only grow with time and experimentation.

There would be greatly reduced moral controversy surrounding ML models if they lacked that mincemeat/pink slime aspect.

ngneer2y ago

I love it. This undermines the notion of ground truth. What separates correct information from incorrect information? Maybe nothing! I love how they acknowledge the never ending attack versus defense game. In stark contrast to "our AI will solve all your problems".

ukuina2y ago

Won't a simple downsample->upsample be the antidote?

jdiff2y ago

No, it's resistant to transformation. Rotation, cropping, scaling, the image remains poisonous. The only antidote known currently is active artist cooperation.

CaptainFever2y ago

Or Img2Img.

wizzwizz42y ago

How do you train your upsampler? (Also: why are you seeking to provide an “antidote”?)

klyrs2y ago

> why are you seeking to provide an “antidote”

I think it's worthwhile for such discussion to happen in the open. If the tool can be defeated through simple means, it's better for everybody to know that, right?

1 more reply

ukuina2y ago

I apologize. I was trying to respond to inflammatory language ("poison") with similarly hyperbolic terms, and I should know better than to do that.

Let me rephrase: Would AI-powered upscaling/downscaling (not a simple deterministic mathematical scaling) not defeat this at a conceptual level?

MrNeon2y ago

>why are you seeking to provide an “antidote”

To train a model on the data.

krapp2y ago

Get permission to use the data.

1 more reply

spookie2y ago

Why would you train one?

gmerc2y ago

Doing the work to increase OpenAIs moat

Drakim2y ago

Obviously AIs can just train on images that aren't poisoned.

jsheard2y ago

Is it possible to reliably detect whether an image is poisoned? If not then it achieves the goal of punishing entities which indiscriminately harvest data.

3 more replies

drdrek2y ago

Only protection is adding giant gaping vaginas to your art, nothing less will deter scraping. If the Email spam community showed us something in the last 40 years is that no amount of defensive tech measures will work except financial disincentives.

paul79862y ago

Any AI art/video/photography/music/etc generator company who generates revenue needs to add watermarks to let the public know its AI generator. This should be forced via legislation in all countries.

If they don't then whatever social network or other services where things can shared/viewed by large groups to millions & are posted publicly need to be labeled "We can not verify veracity of this content."

I want a real internet ..this AI stuff is just triple fold increasing fake crap on the Internet and in turn / time our trust in it!

Duanemclemore2y ago

For visual artists who don't want visible artifacting in the art they feature online, would it be possible to upload these alongside your un-poisoned art, but have them only hanging out in the background? So say having one proper copy and a hundred poisoned copies in the same server, but only showing the un-poisoned one?

Might this "flood the zone" approach also have -some- efficacy against human copycats?

tigrezno2y ago

Do not fight the AI, it's a lost cause, embrace it.

xg152y ago

I wonder how this tool works if it's actually model independent. My understanding so far was that in principle each possible model has some set of pathological inputs for which the classification will be different than what a user sees - but that this set is basically different for each model. So did they actually manage to build an "universal" poison? If yes, how?

freeone30002y ago

It misclassifies objects in clip, which is used for label generation.

iLoveOncall2y ago

I wonder if this is illegal in some countries. In France for example, there is the following law: "Obstructing or distorting the operation of an automated data processing system is punishable by five years' imprisonment and a fine of €150,000.".

If you ask me, this is 100% applicable in this case, so I wonder what a judge would rule.

dist-epoch2y ago

Remember when the music industry tried to use technology to stop music pirating?

This will work about as well...

Oh, I forget, fighting music pirating was considered an evil thing to do on HN. "pirating is not stealing, is copyright infringement", right? Unlike training neural nets on internet content which of course is "stealing".

kevingadd2y ago

FWIW, you're the only use of the word "steal" in this comment thread.

Many people would in fact argue that training AI on people's art without permission is copyright infringement, since the thing it (according to detractors) does is infringe copyright by generating knockoffs of people's work.

You will see some people use the term "stealing" but they're usually referring to how these AIs are sold/operated by for-profit companies that want to make money off artists' work without compensating them. I think it's not unreasonable to call that "stealing" even if the legal definition doesn't necessarily fit 100%.

The music industry is also not really a very good comparison point for independent artists... there is no Big Art equivalent that has a stranglehold on the legislature and judiciary like the RIAA/MPAA do.

xigoi2y ago

The difference is that “pirating” is mostly done by individuals for private use, whereas training is mostly done by megacorporations looking to make more money.

snakeyjake2y ago

A more apt comparison is sampling.

AI is sampling other's works.

Musicians can and do sample. They also obtain clearance for commercial works, pay royalties if required, AND credit the samples if required.

AI "art" does none of that.

Minor49er2y ago

Musicians overwhelmingly do not even attempt to clear samples. This also isn't a great comparison since samples are taken directly out of the audio, not turned into a part of a pattern used to generate new sounds like what AI generators do with images

1 more reply

snerc2y ago

I wonder if we know enough about any of these systems to make such claims. This is all predicated on the fact that this tool will be in widespread use. If it is somehow widely used beyond the folks who have seen it at the top of HN, won't the big firms have countermeasures, ready to deploy?

k__2y ago

How long will this work?

kevingadd2y ago

It's an arms race the bigger players will win, and it undermines the quality of the images. But it feels natural that artists would want to do something since they don't feel like anyone else is protecting them right now.

nnevatie2y ago

The intention is good, from an AI-opponent's perspective. I don't think will work practically, though. The drawbacks for actual users of the image galleries, plus the level of complexity involved in poisoning the samples makes this unfeasible to implement at the scale required.

TenJack2y ago

Wonder if the AI companies are already so far ahead that they can use their AI to detect and avoid any poisoning?

matteoraso2y ago

Too little, too late. There's already very large high quality datasets to train AI art generators.

consoleable2y ago

The naive idea of wanting to protect artists is actually protecting the monopoly of big companies.

Some projects against this behavior:

https://github.com/syncblob/Obey-AI-Luddites

rvba2y ago

The opening website is so poor - "what is nightshade" - then a whole paragraph that tells nothing, then another paragraph.. then no examples. This whole description should be reworked to be shorter and more to the point.

Zetobal2y ago

Well, at least for sdxl it's not working neither in LoRa nor dreambooth finetunes.

paulsutter2y ago

Cute. The effectiveness of any technique like this will be short-lived.

What we really need is clarification of the extent that copyright protection extends to similar works. Most likely from an AI analysis of case law.

24karrotts_2y ago

If you decrease quality of art, you give AI all the advantage in the market.

Kuinox2y ago

> More specifically, we assume the attacker:

> • can inject a small number of poison data (image/text pairs) to the model’s training dataset

I think thoes are bad assumption, labelling is more and more done by some labelling AI.

freeone30002y ago

Usually clip, which is actually how this works — the examples are modified to be misclassified in clip, but look passable to a human.

aussieguy12342y ago

The image generation models now are at the point where they can produce their own synthetic training images. So I'm not sure how big of an impact something like this would have.

ultimoo2y ago

would it have been that hard to include a sample photo and how it looks with the nightshade filter side by side in a 3 page document describing how it would look in great detail

matt32102y ago

Put a TOC on all your content that says “by using my content for AI you agree to pay X per image” and then send them a bill once you see it in an AI.

wruza2y ago

Once you see what exactly? “AI” isn’t some image filter from the early 2000s.

devmor2y ago

Baffling to see anyone argue against this technology when it is a non-issue to any model by simply acquiring only training data you have permission to use.

krapp2y ago

The reason people are arguing against this technology is that no one is using them in the way you describe. They actually wouldn't even be economically viable in that case.

devmor2y ago

If it is not economically viable for you to be ethical, then you do not deserve economic success.

Anyone arguing against this technology following the line of reasoning you present is operating in adverse to the good of society. Especially if their only motive is economic viability.

1 more reply

notfed2y ago

I don't know if asking permission of every copyright holder of every image on the Internet is as simple as you're implying.

Ukv2y ago

I think people 100% have the right to use this on their images, but:

> simply acquiring only training data you have permission to use

Currently it's generally infeasible to obtain licenses at the required scale.

When attempting to develop a model that can describe photos for visually impaired users, I had even tried to reach out to obtain a license from Getty. They repeatedly told me that they don't license images for machine learning[0].

I think it's easy to say "well too bad, it doesn't deserve to exist" if you're just thinking about DALL-E 3, but there's a huge number of positive and far less-controversial applications of machine learning that benefit from web-scale pretraining and foundation models - spam filtering, tumour segmentation, voice transcription, language translation, defect detection, etc.

[0]: https://i.imgur.com/iER0BE2.png

devmor2y ago

I don't believe it's a "doesn't deserve to exist" situation, because these things genuinely can be used for the public good.

However - and this is a big however - I don't believe it deserves the legal protection to be used for profit.

I am of the opinion that if you train your model on data that you do not hold the rights for, your usage should be handled similarly to most fair use laws. It's fine to use it for your personal projects, for research and education, etc. but it is not OK to use it for commercial endeavors.

1 more reply

k__2y ago

What are LLMs that was trained with public domain content only?

I would believe there is enough content out there to get reasonably good results.

mmaunder2y ago

Trying to convince an AI it sees something and a human they don’t is probably a losing battle.

mattszaszko2y ago

This timeline is getting quite similar to the second season of Pantheon.

Aeolun2y ago

How is there not a single example on that website?

will54212y ago

I think the artists need to agree to stop making art altogether. That ought to get people’s attention. Then the AI people might (be socially pressured or legally forced to) put their tools away.

CatWChainsaw2y ago

No, they'll just demand that artists produce more art so they can continue scraping, because if you work in tech you're allowed to be entitled, you're The Face Of The Future and all you're trying to do is Save The World, all these decels are just obstacles to be destroyed.

jdeaton2y ago

Sounds like free adversarial data augmentation.

arisAlexis2y ago

Wouldn't this be applicable to text too?

mjfl2y ago

Another way would be, for every 1 piece of art you make, post 10 AI generated arts, so that the SNR is really bad.

Albert9312y ago

Artist are now fully dependent on Software Engineers for protecting the future of their career lol

1 more reply

whywhywhywhy2y ago

Why are there no examples?

etchalon2y ago

My hope is these type of "poisoning tools" become ubiquitous for all content types on the web, forcing AI companies to, you know, license things.

efitz2y ago

This is the DRM problem again.

However much we might wish that it was not true, ideas are not rivalrous. If you share an idea with another person, they now have that idea too.

If you share words on paper, then someone with eyes and a brain might memorize them (or much more likely, just grasp and retain the ideas conveyed in the words).

If you let someone hear your music, then the ideas (phrasing, style, melody, etc) in that music are transferred.

If you let people see a visual work, then the stylistic and content elements of that work are potentially absorbed by the audience.

We have copyright to protect specific embodiments, but mostly if you try to share ideas with others without letting them use the ideas you shared, then you are in for a life of frustration and escalating arms race.

I completely sympathize with anyone who had a great idea and spent a lot of effort to realize it. If I invented/created something awesome I would be hurt and angry if someone “copied” it. But the hard cold reality is that you cannot “own” an idea.

xpe2y ago

> But the hard cold reality is that you cannot “own” an idea.

The above comment is true about the properties of information, as explained via the lens of economics. [1]

However, one ignores ownership as defined by various systems (including the rule of law and social conventions) at one's own peril. Such systems can also present a "hard cold reality" that can bankrupt or ostracize you.

[1] Don't let the apparent confidence and technicality of the language of economists fool you. Economics isn't the only game in town. There are other ways to model and frame the world.

[2] Dangling footnote warning. I think it is instructive to recognize that the field of economics has historically shown a kind of inferiority complex w.r.t. physics. Some economists ascribe to the level of rigor found in physics and that is well and good, but perhaps that effort should not be taken too seriously nor too far, since economics as a field operates at a different level. IMO, it would be wise for more in the field to eat a slice of humble pie.

[3] Ibid. It is well-known that economists can be "hired guns" used to "prove" a wide variety of things, many of which are subjective. My point: you can hire an economist to shore up one's political proposals. Is the same true of physicists? Hopefully not to the same degree. Perhaps there are some cases of hucksterism, but nothing like the history of economists-wagging-the-dog! At some point, the electron tunnels or it does not.

meowkit2y ago

There are other games in town.

But whatever game gives the most predictive power is going to win.

2 more replies

xpe2y ago

Many terms of art from economics are probably not widely-known here.

> In economics, a good is said to be rivalrous or a rival if its consumption by one consumer prevents simultaneous consumption by other consumers, or if consumption by one party reduces the ability of another party to consume it. - Wikipedia: Rivalry (economics)

Also: we should recognize that stating something as rivalrous or not is descriptive (what exists) not normative (what should be).

fastball2y ago

I think ideas being rivalrous is intrinsic, and therefore descriptive and normative.

1 more reply

kmeisthax2y ago

We're not trying to keep the AI from learning general ideas, we're trying to keep it from memorizing specific expressions[0]. There's a growing body of research to show that these models are doing a lot of memorizing, even if they're not regurgitating that data. For example, Google's little "ask GPT to repeat a word forever" trick, which will make GPT-4 spit out verbatim training data[1].

If there was a training process that let us pick a minimal sample of examples and turn it into a general purpose art generator or text generator, I think people would have been fine with that. But that's not what any of these models do. They were trained on shittons of creative expression, and there's statistical evidence that the models retain that expression, in a way that is fundamentally different from how humans remember, misremember, adapt, remix, and/or "play around with" other people's creativity.

[0] You called these "embodiments", but I believe you're trying to invoke the idea/expression divide, so I'll run with that.

[1] Or at least it did. OpenAI now filters out conversations that trip the bug.

throwoutway2y ago

I don't see the parallel between this offensive tool and DRM. I could, say buy a perpetual license to an image from the artist, so that I can print it and put it on my wall, while it can simultaneously be poisonous to an AI system. I can even steal it and print it, while it is still poisonous to an AI system.

The closest parallel I can think of is that humans can ingest chocolate but dogs should not.

jdietrich2y ago

What you've described is the literal, dictionary definition of Digital Rights Management - a technology to restrict the use of a digital asset beyond the contractually-agreed terms. Copying is only one of many uses that the copyright-holder may wish to prevent. The regional lockout on a DVD had nothing to do with copy-protection, but it was still DRM.

gwbas1c2y ago

It's about the arm's race: DRM will always be cracked (with a sufficiently motivated customer.) AI poisoning will always be cracked (with a sufficiently motivated crawler.)

efitz2y ago

A huge amount of DRM effort has been spent in the watermarking area, which is similar, but not exactly the same.

freeAgent2y ago

This doesn’t stop anyone from viewing or scraping the work, though, so in no way is it DRM. It just causes certain methods of computer interpretation of an image to interpret it in an odd way vs. human viewers. They can still learn from them.

avhon12y ago

It absolutely is DRM, just a different form than media encryption. It's a purely-digital mechanism of enforcing rights.

1 more reply

tsujamin2y ago

Being able to fairly monetise your creative work and put food on the table is a bit rivalrous though, don’t you think?

efitz2y ago

No, I disagree. There is no principle of the universe or across human civilizations that says that you have a right to eat because you produced a creative work.

The way societies work is that the members of the society contribute and benefit in prescribed ways. Societies with lots of excess production may at times choose to allow creative works to be monetized. Societies without much surplus are extremely unlikely to do so, eg a society with not enough food for everyone to eat in the middle of a famine is extremely unlikely to feed people who only create art; those people will have to contribute in some other way.

I think it is a very modern western idea (less than a century old) that many artists can dedicate themselves solely to producing the art they want to produce. In all other times artists either had day jobs or worked on commission.

2 more replies

ikmckenz2y ago

No, rivalrous has a specific meaning https://en.wikipedia.org/wiki/Rivalry_(economics)

renewiltord2y ago

The tragedy of "your business model is not my problem" as a spreading idea is that while you're right since distribution is where the money is (not creation), intellectual property is de-facto weakened today and IP piracy is widely considered an acceptable thing.

bsza2y ago

So is sabotaging solutions that would make creative work of the same (or superior) quality more affordable. Your ability to produce expensive illustrations hinders my ability to produce cheap textbooks.

chefandy2y ago

Not everybody equates automated scraping for training models and human experience. Just like any other “data wants to be free” type of discussion, the philosophical and ethical considerations are anything but cut-and-dried, and they’re far more consequential than the technical and economics-in-a-vacuum ones. The general public will quite possibly see things differently than the “oh well, artists— that’s the free market for ya, and you lost” crowd.

juunpp2y ago

You don't copyright ideas, you copyright works. And these artists' productions are works, not abstract ideas, with copyrights, and they are being violated. This is simple law. Why do people have such a hard time with this? Are you the one training the models and you need to find a cognitive escape out of the illegality and wrong-doing of your activities?

rlt2y ago

It’s not obvious to me that using a copyrighted image to train a model is copyright infringement. It’s certainly not copyright infringement when used to train a human who may end up creating works that are influenced by (but not copies of) the original works.

Now, if the original copyrighted work can be extracted or reproduced from the model, that’s obviously copyright infringement.

OpenAI etc should ensure they don’t do that.

2 more replies

sircastor2y ago

>This is simple law. Why do people have such a hard time with this?

Because this isn’t simple law. It feels like simple infringement, but there’s no actual copying going on. You can’t open up the database and find a given duplicate of a work. Instead you have some abstraction of what it takes to get to a given work.

Also it’s important to point out that nothing in the law is sure. A good lawyer, a sympathetic judge, a bored/interested/contrarian juror, etc can render “settled law” unsettled in an instant. The law is not a set of board game rules.

flkiwi2y ago

If the AI were a human and that human made an image that copied substantial elements of another human's creative work after a careful review of the original creator's work, even if it was not an original copy and no archival copy was stored somewhere in the second creator's creative space, I would be concerned about copyright infringement exposure if I were the second (copying) creator.

I'm open to the idea that copyright law might need to change, but it doesn't seem controversial to note that scraping actual creative works to extract elements for an algorithm to generate new works crosses a number of worrying lines.

jazzyjackson2y ago

Have you seen the examples of midjourney reproducing exact frames of Dune, Star Wars etc? With vague prompting not asking for the media property specifically. It's pretty close to querying a database, except if you're asking for something that's not there it's able to render an interpolated result on the fly. Ask it for something that is there however and the model will dutifully pull it up.

1 more reply

juunpp2y ago

> You can’t open up the database and find a given duplicate of a work. Instead you have some abstraction of what it takes to get to a given work.

So distributing a zip file of a copyrighted work subverts the copyright?

fiddlerwoaroof2y ago

> This is simple law.

“One may well ask: ‘How can you advocate breaking some laws and obeying others?’ The answer lies in the fact that there are two types of laws: just and unjust. I would be the first to advocate obeying just laws. One has not only a legal but a moral responsibility to obey just laws. Conversely, one has a moral responsibility to disobey unjust laws. I would agree with St. Augustine that ‘an unjust law is no law at all.’”

1 more reply

SamPatt2y ago

Illegality and wrongdoing are completely distinct categories.

I'm not convinced that most copyright infringements are immoral regardless of their legal status.

If you post your images for the world to see, and someone uses that image, you are not harmed.

The idea that the world owes you something after you deliberately shared it with others seems bizarre.

4 more replies

theragra2y ago

If it were true, then we wouldn't have that great difference in opinions on this topic.

1 more reply

sebzim45002y ago

That may be the law, although we are probably years of legal proceedings away from finding out.

It obviously is not "simple law".

huytersd2y ago

Nothing is being reproduced. Just the ideas being reused.

xpe2y ago

> ... you cannot “own” an idea.

Let's talk about ownership in a broader sense. In practice, one cannot effectively own (retain possession of) something without some combination of physical capability or coercion (or threat of coercion). Meaning: maintaining ownership of anything (physical or otherwise) often depends on the rule of law.

thomastjeffery2y ago

Then let's use a more precise term that is also present in law: monopoly.

You can't monopolize an idea.

Copyright law is a prescription, not a description. Copyright law demands that everyone play along with the lie that is intellectual monopoly. The effectiveness of that demand depends on how well it can be enforced.

Playing pretend during the age of the printing press may have been easy enough to coordinate, but it's practically impossible here in the digital age.

If we were to increase enforcement to the point of effectiveness, then what society would be left to participate? Surely not a society I am keen to be a part of.

1 more reply

wredue2y ago

Kick ass.

I now declare that I own Fortnite.

Where’s my money, Epic?

Devasta2y ago

Delighted to see it. Fuck AI art.

gumballindie2y ago

This is excellent. We need more tools like this, for text content as well. For software we need GPL 4 with ML restrictions (make your model open source or not at all). Potentially even DRM for text.

j / k navigate · click thread line to collapse

678 comments

ink404OP2y ago

Paper is here: https://arxiv.org/abs/2310.13828

5424582y ago

This seems to introduce levels of artifacts that many artists would find unacceptable: https://twitter.com/sini4ka111/status/1748378223291912567

[1]: Courtesy of /u/b3sn0w on Reddit: https://imgur.com/cI7RLAq https://imgur.com/eqe3Dyn https://imgur.com/1BMASL4

kmeisthax2y ago

[0] All learning capability of text generators come from the fact that they have a context window; but that only provides a short term memory of 2048 tokens. They have no other memory capability.

[1] The scenario of what happens when you do this is fancifully called Habsburg AI. The model learns from it's own biases, reinforcing them into stronger biases, while forgetting everything else.

[2] It'd be particularly ironic if the only thing Nightshade harms is the one AI generator that tried to be even slightly ethical.

scheeseman4862y ago

hkt2y ago

It is a great shame that we have come to a no-win situation for artists when VCs are virtually unable to lose.

1 more reply

kmeisthax2y ago

Why wouldn't an artist just generate AI spam and Nightshade it?

visarga2y ago

1 more reply

KTibow2y ago

1 more reply

webmaven2y ago

GaggiX2y ago

ptdn2y ago

The context windows of LLMs are now significantly larger than 2048 tokens, and there are clever ways to autopopulate context window to remind it of things.

jerbear43282y ago

[3] sounds really interesting - do you have a link?

1 more reply

brucethemoose22y ago

Yeah. At worst a simple img2img diffusion step would mitigate this, but just eyeballing the examples, traditional denoisers would probably do the job?

Denoising is probably a good preprocessing step anyway.

achileas2y ago

It’s a common preprocessing step and I believe that’s how glaze (this lab’s previous work) was defeated.

pimlottc2y ago

I can’t really see any difference in those images on the Twitter example when viewing it on mobile

vhcr2y ago

2 more replies

fenomas2y ago

At full size it's super obvious - I made a side-by-side:

https://i.imgur.com/I6EQ05g.png

1 more reply

josefx2y ago

Something similar to jpeg artifacts on any surface with a normally smooth color gradient, in some cases rather significant.

0xcde4c3db2y ago

Keyframe2y ago

look at the green drapes to the right, or any large uniform colored space. It looks similar to bad JPEG artifacts.

pxc2y ago

I don't have great vision, but me neither. They're indistinguishable to me (likewise on mobile).

1 more reply

jquery2y ago

It's really noticeable on desktop, like compressing an 800kb jpeg to 50kb. Maybe on mobile you won't notice, but on desktop the image looks blown out.

milsorgen2y ago

It took me a minute too but on the fast you can see some blocky artifacting by the elbow and a few spots elsewhere like curtain upper left.

charcircuit2y ago

The gradient on the bat has blocks in it instead of being smooth.

gedy2y ago

Maybe it's more about "protecting" images that artists want to publicly share to advertise work, but it's not appropriate for final digital media, etc.

sesm2y ago

In short, anti-AI watermark.

johnnyanmac2y ago

Yeah. It may mess with the artist's vision but the impact is still way more subtle than other methods used to protect against these unwanted actions.

Of course I'm assuming it works to begin with. Sounds like a game of cat and mouse. And AI has a lot of rich cats.

kjs32y ago

h0p32y ago

Sir /u/b3nsn0w is courteous, `/nod`.

GaryNumanVevo2y ago

The artifacts are a non-issue. It's intended images with nightshade are intended to be silently scrapped and avoid human filtering.

minimaxir2y ago

The artifacts are extremely an issue for artists who don't want their images damaged for the possibility of them not being trained by AI.

It's a bad tradeoff.

1 more reply

the84722y ago

do you mean scrapped or scraped?

1 more reply

soulofmischief2y ago

> The artifacts are a non-issue.

According to which authority?

gfodor2y ago

AJ0072y ago

The level of claims accompanied by enthusiastic reception from a technically illiterate audience make it sound, smell, and sound like snake oil without much deep investigation.

Gormo2y ago

wraptile2y ago

2 more replies

Art96812y ago

I'm not defending it. Just acknowledging the reality. The next TMZ for private art gatherings is percolating in someone's garage at the moment.

3 more replies

gfodor2y ago

True I can imagine that kind of thing becoming popular.

thfuran2y ago

>There is no way that such tools will ever win, given the requirements the art remain viewable to human perception

On the other hand, the adversarial environment might push models towards a representation more aligned with human perception, which is neat.

aqfamnzc2y ago

The ol' Analog Gap. https://en.m.wikipedia.org/wiki/Analog_hole

Reubend2y ago

> Huge market for snake oil here.

This tool is free, and as far as I can tell it runs locally. If you're not selling anything, and there's no profit motive, then I don't think you can reasonably call it "snake oil".

At worst, it's a waste of time. But nobody's being deceived into purchasing it.

autoexec2y ago

I don't think that's the intention of Nightshade, but I wouldn't put past someone to try it.

Biganon2y ago

There's an academic paper being published.

Snake oil for the sake of getting published is a very real problem that does exist.

golol2y ago

Religion is also deceptive and snake-oil even if it does not involve profit driven motivations.

1 more reply

spaceman_20202y ago

This is the hard reality. There is no putting this genie back in the bottle.

The only way to be an artist now is to have a unique style of your own, and to never make it online.

hutzlibu2y ago

"and to never make it online."

2 more replies

jedberg2y ago

Everything old is new again. It's the same thing with any DRM that happens on the client side. As long as it's viewable by humans, someone will figure out a way to feed that into a machine.

honkycat2y ago

"A law, ie, leveraging state backed violence to deter the things they don’t want."

We all know what a law is you don't need to clarify. It makes your prose less readable.

gfodor2y ago

Other people pointed out they appreciated this prose. It’s easy to forget what exactly people are asking for when they talk about regulating the training of machine learning models.

jMyles2y ago

> leveraging state backed violence to deter the things they don’t want

I just want to say: I really appreciate the stark terms in which you've put this.

The thing that has come to be called "intellectual property" is actually just a threat of violence against people who arrange bytes in a way that challenges power structures.

mihaaly2y ago

vmirnv2y ago

nihilius2y ago

int_19h2y ago

If humans can process it, you can train a model to do the same.

elzbardico2y ago

You don’t need it to visible. You only need it to be scrapped to poison the models. I think that’s the idea.

AlfeG2y ago

abrarsami2y ago

It should be like that. I agree

minimaxir2y ago

smrtinsert2y ago

Great lora post, thanks for sharing this again! Not sure how I missed as I'm especially interested in sd content.

eigenvalue2y ago

johnnyanmac2y ago

As is, art already isn't a sustainable career for most people who can't get a job in industry. The most common monetization is either commissions or hiding extra content behind a pay wall.

Gigachad2y ago

>need to find underground training models

raincole2y ago

> This seems like a pretty pointless "arms race" or "cat and mouse game".

If there is any "point" of this, it's that's going to push the AI models to become better at capturing how humans see things.

jMyles2y ago

> musicians make most of their money nowadays from touring and selling merchandise at shows

Be reminded that this is - and has always been - the mainstream model of the lineages of what have come to be called "traditional" and "Americana" and "Appalachian" music.

mihaaly2y ago

It is a funny appearance (weird viewpoint) that artists are furious loosing their monopily in stealing and cloning components from other artists, recomposing into a similar but new thing.

hackernewds2y ago

the point is you could circumvent one nightshade, but as long as the cat and mouse game continues there can be more

marcinzm2y ago

r3trohack3r2y ago

The number of people who are going to be able to produce high fidelity art with off the shelf tools in the near future is unbelievable.

It’s pretty exciting.

Being able to find a mix of styles you like and apply them to new subjects to make your own unique, personalized, artwork sounds like a wickedly cool power to give to billions of people.

kredd2y ago

In terms of art, population tends to put value not on the result, but origin and process. People will just look down on any art that’s AI generated in a couple of years when it becomes ubiquitous.

petesergeant2y ago

> population tends to put value not on the result, but origin and process

I think population tends to value "looks pretty", and it's other artists, connoisseurs, and art critics who value origin and process. Exit Through the Gift Shop sums this up nicely

Aerroon2y ago

I disagree. I definitely value modern digital art more than most historical art, because it just looks better. If AI art looks better (and in some cases it does) then I'll prefer that.

1 more reply

redwall_hp2y ago

This is already the case. Art is a process, a form of human expression, not an end result.

3 more replies

Theodores2y ago

https://en.wikipedia.org/wiki/Labor_theory_of_value

According to Marx, value is only created with human labour. This is not just a Marxist theory, it is an observation.

There may be lots of over-priced junk that makes you want to question this idea. But let's not nit-pick on that.

In two years time people will not see any value in AI art, quite correctly because there is not much human labour in creating it.

5 more replies

MacsHeadroom2y ago

Nope, but I already look down on artists who refuse to integrate generative AI into their processes.

3 more replies

falcolas2y ago

> Being able to find a mix of styles you like and apply them to new subjects to make your own unique, personalized, artwork sounds like a wickedly cool power to give to billions of people.

And in the process, they will obviate the need for Nightshade and similar tools.

AI models ingesting AI generated content does the work of destroying the models all by itself. Have a look at "Model Collapse" in relation to generative AI.

23B12y ago

It'll be about as wickedly tool as the ability to get on the internet, e.g. commoditized, transactional, and boring.

sebzim45002y ago

I know this is an unpopular thing to say these days, but I still think the internet is amazing.

If I don't care about the law I can read any of 25 million books or 100 million scientific papers all available on Anna's Archive for free in seconds.

2 more replies

__loam2y ago

And we only had to alienate millions of people from their labor to do it.

r3trohack3r2y ago

Absolutely agree we should allow people to accumulate equity through effective allocation of their labor.

And I also agree that we shouldn’t build systems that alienate people from that accumulated equity.

DennisAleynikov2y ago

Yeah, sadly those millions of people don’t matter in the grand scheme of things and were never going to profit off their work long term

2 more replies

mensetmanusman2y ago

Is this utilitarianism?

BeFlatXIII2y ago

Worth it.

password543212y ago

Not really. There is a reason why we find realistic painting to be more fascinating than a photo and why some still practice it. The effort put in by another artist does affect our enjoyment.

wruza2y ago

dartharva2y ago

The word "we" is doing a lot of heavy lifting here. A large majority of consumers can't even tell apart AI-generated from handmade, let alone care who or what made the thing.

1 more reply

chris-orgmenta2y ago

I want a scaling license fee to apply (e.g. % pegged to revenue. This still has an indirect problem with different industries having different profit margins, but still seems the fairest).

And I want the world (or EU, then others to follow suit) to slowly reduce copyright to 0 years* after artists death if owned by a person, and 20-30 years max if owned by a corporation.

[*]or whatever society deems appropriate.

[**]Until auto-detection (for better or worse) gets good enough.

IMO that would allow personal use, encourages new entrants to market, encourages innovation, incentivises better behaviour from OpenAI et al.

Dylan168072y ago

> And I want the world (or EU, then others to follow suit) to slowly reduce copyright to 0 years* after artists death if owned by a person, and 20-30 years max if owned by a corporation.

Why death at all?

It's icky to trigger soon after death, it's bad to have copyright vary so much based on author age, and it's bad for many works to still have huge copyright lengths.

It's perfectly fine to let copyright expire during the author's life. 20-30 years for everything.

wraptile2y ago

1 more reply

alentred2y ago

With this "solution" it looks like the world of art enters the cat-and-mouse game the ad blockers were playing for the last decade or two.

isodev2y ago

I just tested it with Azure AI image classification and it worked - so this cat is yet to adapt to the mouse’s latest idea.

HKH22y ago

1 more reply

KTibow2y ago

the84722y ago

It doesn't even have to be a full GAN, you only need to train the discriminator side to filter out the data. Clean reference images + Nightshade would be the generator side.

brucethemoose22y ago

What the article doesn't illustrate is that it destroys fine detail in the image, even in the thumbnails of the reference paper: https://arxiv.org/pdf/2310.13828.pdf

Also... Maybe I am naive, but it seems rather trivial to work around with a quick prefilter? I don't know if tradition denoising would be enough, but worst case you could run img2img diffusion.

GaryNumanVevo2y ago

minimaxir2y ago

> You wouldn't be able to denoise as you'd have to denoise the entire dataset

Doing that requires much less compute than training a large generative image model.

2 more replies

jamesu2y ago

Long-term I think the real problem for artists will be corporations generating their own high quality targeted datasets from a cheap labor pool, completely outcompeting them by a landslide.

jdietrich2y ago

ufocia2y ago

It will democratize art.

sussmannbaka2y ago

Art is already democratized. It has been for decades. Everyone can pick it up at zero cost. Even you!

The poorest people have historically produced great art. Training a model, however? Expensive. Running it locally? Expensive. Paying the sub? Expensive.

Nothing is being democratized, the only thing this does is devaluing the blood and sweat people have put into their work so FAANG can sell it to lazy suckers.

1 more reply

23B12y ago

then it won't be art anymore, it'll just be mountains of shit

sorta like what the laptop did for writing

1 more reply

Quanttek2y ago

popohauer2y ago

luma2y ago

If it were a derivative work[1] (and sufficiently transformational) then it's allowed under current copyright law and might not be the slam dunk ruling you were hoping for.

[1] https://en.wikipedia.org/wiki/Derivative_work

3 more replies

torginus2y ago

int_19h2y ago

If the courts do rule that way, I would expect a legislative race between different countries to amend the relevant laws. Visual generative AI is just too lucrative a thing.

BeFlatXIII2y ago

…then I'll keep enjoying my Stable Diffusion and pirated models.

Kuinox2y ago

> they should license the content they use for the training data

You mean like OpenAI and Adobe ?

Only the free and open source models didn't licensed any content for the training data.

galleywest2002y ago

Adobe is training off of images stored in their cloud systems, per their Terms of Service.

OpenAI has provided no such documentation or legal guarantees, and it is still quite possible they scraped all sorts of copyright materials.

4 more replies

KeplerBoy2y ago

There is a small difference between any and all. OpenAI certainly didn't licence all of the image they use for training.

jazzyjackson2y ago

source for OpenAI paying anyone a dime? don't you think that would set a precedent that everyone else deserves their cut?

eddd-ddde2y ago

Isn't this just teaching the models how to better understand pictures as humans do? As long as you feed them content that looks good to a human, wouldn't they improve in creating such content?

lern_too_spel2y ago

You would think the economists at UChicago would have told these researchers that their tool would achieve the opposite effect of what they intended, but here we are.

GaggiX2y ago

garg2y ago

Each time there is an update to training algorithms and in response poisoning algorithms, artists will have to re-glaze, re-mist, and re-nightshade all their images?

Eventually I assume the poisoning artifacts introduced in the images will be very visible to humans as well.

msp262y ago

>Like Glaze, Nightshade is computed as a multi-objective optimization that minimizes visible changes to the original image.

It's still noticeably visible.

kevingadd2y ago

Yeah, I've seen multiple artists complain about how glazing reduces image quality. It's very noticeable. That seems like an unavoidable problem given how AI is trained on images right now.

popohauer2y ago

Minor49er2y ago

Is the existence of the AI tool not itself a product of authentic creativity? Does eliminating barriers to image generation not facilitate authentic creativity?

23B12y ago

Enjoy the short term novelty while you can.

fulladder2y ago

int_19h2y ago

So then you'll have curation to find the gems in that noise.

BeFlatXIII2y ago

> Art – real art – is fundamentally a human-to-human transaction.

Why is this hippie nonsense so popular?

1 more reply

peter_d_sherman2y ago

To protect an individual's image property rights from image generating AI's -- wouldn't it be simpler for the IETF (or other standards-producing group) to simply create an

AI image exclusion standard

, similar to "robots.txt" -- which would tell an AI data-gathering web crawler that a given image or set of images -- was off-limits for use as data?

https://en.wikipedia.org/wiki/Robots.txt

https://www.ietf.org/

potatolicious2y ago

If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.

peter_d_sherman2y ago

Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

Robots.txt survived because it was a "digital signpost" a "digital sign" -- sort of like the way you might put a "Private Property -- No Trespassing" sign in your yard.

Most moral/ethical/lawful people -- will obey that sign.

Some might not.

But the some that might not -- probably constitute about a 0.000001% minority of the population, whereas the majority that do -- probably constitute about 99.99999% of the population.

"Robots.txt" is a sign -- much like a road sign is.

People can obey them -- or they can ignore them -- but they can ignore them only at their own peril!

It's a sign which provides a hint for what the right thing to do in a certain set of circumstances -- which is what the Law is; which is what the majority of Laws are.

People can obey them -- or they can choose to ignore them -- but only at their own peril!

Most will choose to obey them. Most will choose to "take the hint", proverbially speaking!

A few might not -- but that doesn't mean the majority won't!

>If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.

Again, name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

xg152y ago

peter_d_sherman2y ago

>The scrapers themselves already happily ignore copyright, they won't be inclined to obey a no-ai.txt.

Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Currently I see no organisation who would be willing to do this or even just technologically able - as even just detecting such scrapers is an extremely hard task.

// Part of Image Web Scraper For AI Image Generator ingestion psuedocode:

if fileExists("no-ai.txt") {

  // Abort image scraping for this site -- move on to the next site

} else {

  // Continue image scraping for this site

};

See? Nice and simple!

Legal grey area perhaps?

Also -- what about Images that exist in other legal jurisdictions -- i.e., other countries?

How do we know what set of laws are to apply to a given image?

So kindly think of these issues prior to making all-encompasing proposals as to what you think should be "legally binding" or not.

1 more reply

ang_cire2y ago

Setting aside the efficacy of this tool, I would be very interested in the legal implications of putting designs in your art that could corrupt ML models.

For instance, if I set traps in my home which hurt an intruder we are both guilty of crimes (traps are illegal and are never considered self defense, B&E is illegal).

Would I be responsible for corrupting the AI operator's data if I intentionally include adversarial artifacts to corrupt models, or is that just DRM to legally protect my art from infringement?

edit:

I replied to someone else, but this is probably good context:

DRM is legally allowed to disable or even corrupt the software or media that it is protecting, if it detects misuse.

anigbrowl2y ago

None whatsoever. There is no right to good data for model training, nor does any contractual relationship exist between you and and a model builder who scrapes your website.

ang_cire2y ago

anigbrowl2y ago

https://en.wikipedia.org/wiki/Trap_street

1 more reply

ufocia2y ago

Worth trying but I doubt it unless we establish a right to train.

danShumway2y ago

kortilla2y ago

That’s like asking if lying on a forum is illegal

ang_cire2y ago

DRM can, for instance, disable its own parent tool (e.g. a video game) if it detects misuse, but it can't attack the host computer or other software on that computer.

1 more reply

npteljes2y ago

https://en.wikipedia.org/wiki/Trap_street

CaptainFever2y ago

Japan is considering it, I think? https://news.ycombinator.com/item?id=38615280

GaryNumanVevo2y ago

How would that situation be remotely related?

fennecfoxy2y ago

I find the AI training topic interesting, because it's really data/information that is involved. Forget about the fact that it's images or stories or Reddit posts, it's all data.

We are born and then exposed to the torrent of data from the world around us, mostly fed to us by other humans, this is what models are trying to tap.

Unfortunately our learning process is completely organic and takes decades and decades and decades; there's no way to put a model through this easily.

zirgs2y ago

Does it survive AI upscaling or img2img? If not - then it's useless. Nobody trains AI models without any preprocessing. This is basically a tool for 2022.

gweinberg2y ago

Shemetz2y ago

ThinkBeat2y ago

In so far as anger goes against AIs being trained on particular intellectual properties.

Then we can imagine the AI somehow visiting a lot of museums and galleries.

The AI will now have been trained on the style and looks of a lot of art from different artists

All the material has been obtained in a legal manner.

Is this an acceptable use?

Or can an artist still assert that the AI was trained with their IP without consent?

Clearly this is one of the ways a human would go about learning about styles, techniques etc..

nanofus2y ago

enord2y ago

ronsor2y ago

You act like computers and ML models aren't just tools used by people.

enord2y ago

What did I write to give you that impression?

1 more reply

CaptainFever2y ago

No one is saying a model is the legal entity. The legal entities are still people and corporations.

enord2y ago

I feel like I’m taking crazy pills TBQH

squidbeak2y ago

beepbooptheory2y ago

jwells892y ago

There would be greatly reduced moral controversy surrounding ML models if they lacked that mincemeat/pink slime aspect.

ngneer2y ago

ukuina2y ago

Won't a simple downsample->upsample be the antidote?

jdiff2y ago

No, it's resistant to transformation. Rotation, cropping, scaling, the image remains poisonous. The only antidote known currently is active artist cooperation.

CaptainFever2y ago

Or Img2Img.

wizzwizz42y ago

How do you train your upsampler? (Also: why are you seeking to provide an “antidote”?)

klyrs2y ago

> why are you seeking to provide an “antidote”

I think it's worthwhile for such discussion to happen in the open. If the tool can be defeated through simple means, it's better for everybody to know that, right?

1 more reply

ukuina2y ago

I apologize. I was trying to respond to inflammatory language ("poison") with similarly hyperbolic terms, and I should know better than to do that.

Let me rephrase: Would AI-powered upscaling/downscaling (not a simple deterministic mathematical scaling) not defeat this at a conceptual level?

MrNeon2y ago

>why are you seeking to provide an “antidote”

To train a model on the data.

krapp2y ago

Get permission to use the data.

1 more reply

spookie2y ago

Why would you train one?

gmerc2y ago

Doing the work to increase OpenAIs moat

Drakim2y ago

Obviously AIs can just train on images that aren't poisoned.

jsheard2y ago

Is it possible to reliably detect whether an image is poisoned? If not then it achieves the goal of punishing entities which indiscriminately harvest data.

3 more replies

drdrek2y ago

paul79862y ago

Any AI art/video/photography/music/etc generator company who generates revenue needs to add watermarks to let the public know its AI generator. This should be forced via legislation in all countries.

I want a real internet ..this AI stuff is just triple fold increasing fake crap on the Internet and in turn / time our trust in it!

Duanemclemore2y ago

Might this "flood the zone" approach also have -some- efficacy against human copycats?

tigrezno2y ago

Do not fight the AI, it's a lost cause, embrace it.

xg152y ago

freeone30002y ago

It misclassifies objects in clip, which is used for label generation.

iLoveOncall2y ago

If you ask me, this is 100% applicable in this case, so I wonder what a judge would rule.

dist-epoch2y ago

Remember when the music industry tried to use technology to stop music pirating?

This will work about as well...

kevingadd2y ago

FWIW, you're the only use of the word "steal" in this comment thread.

xigoi2y ago

The difference is that “pirating” is mostly done by individuals for private use, whereas training is mostly done by megacorporations looking to make more money.

snakeyjake2y ago

A more apt comparison is sampling.

AI is sampling other's works.

Musicians can and do sample. They also obtain clearance for commercial works, pay royalties if required, AND credit the samples if required.

AI "art" does none of that.

Minor49er2y ago

1 more reply

snerc2y ago

k__2y ago

How long will this work?

kevingadd2y ago

nnevatie2y ago

TenJack2y ago

Wonder if the AI companies are already so far ahead that they can use their AI to detect and avoid any poisoning?

matteoraso2y ago

Too little, too late. There's already very large high quality datasets to train AI art generators.

consoleable2y ago

The naive idea of wanting to protect artists is actually protecting the monopoly of big companies.

Some projects against this behavior:

https://github.com/syncblob/Obey-AI-Luddites

rvba2y ago

Zetobal2y ago

Well, at least for sdxl it's not working neither in LoRa nor dreambooth finetunes.

paulsutter2y ago

Cute. The effectiveness of any technique like this will be short-lived.

What we really need is clarification of the extent that copyright protection extends to similar works. Most likely from an AI analysis of case law.

24karrotts_2y ago

If you decrease quality of art, you give AI all the advantage in the market.

Kuinox2y ago

> More specifically, we assume the attacker:

> • can inject a small number of poison data (image/text pairs) to the model’s training dataset

I think thoes are bad assumption, labelling is more and more done by some labelling AI.

freeone30002y ago

Usually clip, which is actually how this works — the examples are modified to be misclassified in clip, but look passable to a human.

aussieguy12342y ago

The image generation models now are at the point where they can produce their own synthetic training images. So I'm not sure how big of an impact something like this would have.

ultimoo2y ago

would it have been that hard to include a sample photo and how it looks with the nightshade filter side by side in a 3 page document describing how it would look in great detail

matt32102y ago

Put a TOC on all your content that says “by using my content for AI you agree to pay X per image” and then send them a bill once you see it in an AI.

wruza2y ago

Once you see what exactly? “AI” isn’t some image filter from the early 2000s.

devmor2y ago

Baffling to see anyone argue against this technology when it is a non-issue to any model by simply acquiring only training data you have permission to use.

krapp2y ago

The reason people are arguing against this technology is that no one is using them in the way you describe. They actually wouldn't even be economically viable in that case.

devmor2y ago

If it is not economically viable for you to be ethical, then you do not deserve economic success.

Anyone arguing against this technology following the line of reasoning you present is operating in adverse to the good of society. Especially if their only motive is economic viability.

1 more reply

notfed2y ago

I don't know if asking permission of every copyright holder of every image on the Internet is as simple as you're implying.

Ukv2y ago

I think people 100% have the right to use this on their images, but:

> simply acquiring only training data you have permission to use

Currently it's generally infeasible to obtain licenses at the required scale.

[0]: https://i.imgur.com/iER0BE2.png

devmor2y ago

I don't believe it's a "doesn't deserve to exist" situation, because these things genuinely can be used for the public good.

However - and this is a big however - I don't believe it deserves the legal protection to be used for profit.

1 more reply

k__2y ago

What are LLMs that was trained with public domain content only?

I would believe there is enough content out there to get reasonably good results.

mmaunder2y ago

Trying to convince an AI it sees something and a human they don’t is probably a losing battle.

mattszaszko2y ago

This timeline is getting quite similar to the second season of Pantheon.

Aeolun2y ago

How is there not a single example on that website?

will54212y ago

I think the artists need to agree to stop making art altogether. That ought to get people’s attention. Then the AI people might (be socially pressured or legally forced to) put their tools away.

CatWChainsaw2y ago

jdeaton2y ago

Sounds like free adversarial data augmentation.

arisAlexis2y ago

Wouldn't this be applicable to text too?

mjfl2y ago

Another way would be, for every 1 piece of art you make, post 10 AI generated arts, so that the SNR is really bad.

Albert9312y ago

Artist are now fully dependent on Software Engineers for protecting the future of their career lol

1 more reply

whywhywhywhy2y ago

Why are there no examples?

etchalon2y ago

My hope is these type of "poisoning tools" become ubiquitous for all content types on the web, forcing AI companies to, you know, license things.

efitz2y ago

This is the DRM problem again.

However much we might wish that it was not true, ideas are not rivalrous. If you share an idea with another person, they now have that idea too.

If you share words on paper, then someone with eyes and a brain might memorize them (or much more likely, just grasp and retain the ideas conveyed in the words).

If you let someone hear your music, then the ideas (phrasing, style, melody, etc) in that music are transferred.

If you let people see a visual work, then the stylistic and content elements of that work are potentially absorbed by the audience.

xpe2y ago

> But the hard cold reality is that you cannot “own” an idea.

The above comment is true about the properties of information, as explained via the lens of economics. [1]

[1] Don't let the apparent confidence and technicality of the language of economists fool you. Economics isn't the only game in town. There are other ways to model and frame the world.

meowkit2y ago

There are other games in town.

But whatever game gives the most predictive power is going to win.

2 more replies

xpe2y ago

Many terms of art from economics are probably not widely-known here.

Also: we should recognize that stating something as rivalrous or not is descriptive (what exists) not normative (what should be).

fastball2y ago

I think ideas being rivalrous is intrinsic, and therefore descriptive and normative.

1 more reply

kmeisthax2y ago

[0] You called these "embodiments", but I believe you're trying to invoke the idea/expression divide, so I'll run with that.

[1] Or at least it did. OpenAI now filters out conversations that trip the bug.

throwoutway2y ago

The closest parallel I can think of is that humans can ingest chocolate but dogs should not.

jdietrich2y ago

gwbas1c2y ago

It's about the arm's race: DRM will always be cracked (with a sufficiently motivated customer.) AI poisoning will always be cracked (with a sufficiently motivated crawler.)

efitz2y ago

A huge amount of DRM effort has been spent in the watermarking area, which is similar, but not exactly the same.

freeAgent2y ago

avhon12y ago

It absolutely is DRM, just a different form than media encryption. It's a purely-digital mechanism of enforcing rights.

1 more reply

tsujamin2y ago

Being able to fairly monetise your creative work and put food on the table is a bit rivalrous though, don’t you think?

efitz2y ago

No, I disagree. There is no principle of the universe or across human civilizations that says that you have a right to eat because you produced a creative work.

2 more replies

ikmckenz2y ago

No, rivalrous has a specific meaning https://en.wikipedia.org/wiki/Rivalry_(economics)

renewiltord2y ago

bsza2y ago

chefandy2y ago

juunpp2y ago

rlt2y ago

Now, if the original copyrighted work can be extracted or reproduced from the model, that’s obviously copyright infringement.

OpenAI etc should ensure they don’t do that.

2 more replies

sircastor2y ago

>This is simple law. Why do people have such a hard time with this?

flkiwi2y ago

jazzyjackson2y ago

1 more reply

juunpp2y ago

> You can’t open up the database and find a given duplicate of a work. Instead you have some abstraction of what it takes to get to a given work.

So distributing a zip file of a copyrighted work subverts the copyright?

fiddlerwoaroof2y ago

> This is simple law.

1 more reply

SamPatt2y ago

Illegality and wrongdoing are completely distinct categories.

I'm not convinced that most copyright infringements are immoral regardless of their legal status.

If you post your images for the world to see, and someone uses that image, you are not harmed.

The idea that the world owes you something after you deliberately shared it with others seems bizarre.

4 more replies

theragra2y ago

If it were true, then we wouldn't have that great difference in opinions on this topic.

1 more reply

sebzim45002y ago

That may be the law, although we are probably years of legal proceedings away from finding out.

It obviously is not "simple law".

huytersd2y ago

Nothing is being reproduced. Just the ideas being reused.

xpe2y ago

> ... you cannot “own” an idea.

thomastjeffery2y ago

Then let's use a more precise term that is also present in law: monopoly.

You can't monopolize an idea.

Playing pretend during the age of the printing press may have been easy enough to coordinate, but it's practically impossible here in the digital age.

If we were to increase enforcement to the point of effectiveness, then what society would be left to participate? Surely not a society I am keen to be a part of.

1 more reply

wredue2y ago

Kick ass.

I now declare that I own Fortnite.

Where’s my money, Epic?

Devasta2y ago

Delighted to see it. Fuck AI art.

gumballindie2y ago

This is excellent. We need more tools like this, for text content as well. For software we need GPL 4 with ML restrictions (make your model open source or not at all). Potentially even DRM for text.

j / k navigate · click thread line to collapse