Why would any human spend their limited lifespan to create a piece of work that will be grabbed without permission, approximated algorithmically (at least on the surface) and reused in infinite possible small variations without any attribution or remuneration whatsoever?
This feels like a reversion to medieval times with minimal trade between regions as thieves would ambush traders and steal any goods.
This is where centuries of copyright law have gotten us, brainwashing people into thinking ideas are property ("intellectual property") and should be treated as rivalrous goods, and that the only reason to be an artist is to profit. Brainwashing people into thinking the only way we'll have art in the world is if we maximize the profits of commercial artists.
Take one brief look at the internet, music, video, podcasts, a museum, the walls and refrigerators in people's homes, a kindergarten class room, an art class, or hell, this very forum. And you'll see that it's universally true that people like creating stuff because people like creating stuff. For free. Because it's fun and stimulating. That's inherent in us. We do not need laws to prop up an artificial business model for humans to maintain our drive to create.
Why do we have stars on github? Maybe not for fame, but it is a good indicator of how somebody is good. Fame is important. We do not make github repositories for "stars", but I think it is a good motivator for people to continue what they're doing.
If there is an author who spent years of his life into producing some kind of music piece, should not there be some kind of laws protecting his work from theft?
But people also have to eat, need shelter, want kids, have to take care of health issues, and so on. For that they need money. If they can't get money from their creations they'll spend less time creating and more time engaging in activity that generates returns.
And we still have copyright laws. Corporations are still hoarding IP. So rules for thee, not for me. The harder you make it for artists to get paid, the more people you get promoting Raid Shadow Legends
Because they enjoy it? Or do you see artists as some type of corporate drone who hates the very act of making art?
That's like asking why anyone would contribute to MIT or Apache licensed open source.
The idea that authors, artists and other creatives will keep pumping original work as part-time love affairs so that AI bros can grab it and mint a dime is... strange.
I don't understand how people think this will suddenly make human art vanish, that is just ridiculous and naive. People will spend their limited lifespan to make art, because that's what humans just do. Cavemen weren't being paid to paint on the walls.
Your viewpoint is honestly insane. The people against AI art are bonkers.
Additionally, the beauty of contemporary AI is that it's much more similar to the mechanism of inspiration and learning that humans employ than, let's say, the literal copy of a photo. I think it's reasonable for an artist to limit the visibility of their work and prohibit their images from being shared online and used by AI training - but this must apply across the board. If their image is public in a way that anyone could see it, and be inspired by it, then they need to accept that the AI could be equally "inspired" by it.
If you want an easy, concrete example of the process of inspiration and copying taking place for humans, just look at Animes. Styles are imitated by humans left and right, with no concern of original artists losing their livelihood. Human Anime artists copy their idols when learning to draw, oftentimes producing literal imitations for years before starting to produce their own original work, which to be honest usually greatly resembles the source of inspiration (PS: I like Animes, but most of it is very similar in terms of artistic style).
Do humans ever really "invent" any art? Or are the artistic innovators simply a mix of influence of existing art, the natural images of nature/life plus a spice of randomness? Because that's pretty much how AI art functions.
Older societies did not have copyright but they, manifestly, had ways to sustain creatives.
People wax philosophical about paradigm changes and other vacuities yet refuse to answer a simple question: how will society reward human creativity that takes a lifetime of cultivation to flourish.
It truly boggles the mind that people equate machines that can output thousands and thousands of images in short time spans in any ingested style... with humans who have to hone styles and can only produce a result every so often.
Most people care only about the output of a system, not about who the system replaces.
If you solely listen to either side, you'll be blinded by madness. On that note, how many people respect or value the effort behind software? Most don't, not most artists either. That is the nature of life.
Because many people make art for art's sake.
Besides popular art works always essentially had this yet people still made them. The difference is in scale not kind.
Or to put it in another context - why would anyone work on an open source project when their work can be reused without (explicit) permission, cloned and reused in infinite small varations without any renumeration and essentially no credit (when was the last time you actually looked at the CREDITS file in an open source project? Have you ever?)
Why does it matter if some artist uses ChatGPT to knock-off your style indirectly rather than directly?
I mean, sure, ChatGPT is better at it than most artists. Is that the problem? The quality of knock-offs is too good now?
Take any new song - and any music head can list several songs it is just like. Take any new movie - and most screenwriters could go on for hours how it's almost exactly 10 different movies. Etc.
There is nothing new under the Sun.
Creative people will create regardless of financial incentives. Fan fiction and free art is already everywhere, for instance.
Its amazing how callous tech people have become as they salivate for their unicorns or whatever they are pursuing.
> Orrick spends the rest of his ruling explaining why he found the artists’ complaint defective, which includes various issues, but the big one being that two of the artists — McKernan and Ortiz, did not actually file copyrights on their art with the U.S. Copyright Office.
In other words, the story here is mostly that the lawyers screwed up badly in pursuing a copyright lawsuit before ensuring copyright had been filed.
You put work into posting this comment: thought about the situation and crafted sentences you wanted to publish. I've absorbed them, learned from them, they'll inform my own output in the future. And respectfully, I won't remember your name or give credit.
So why did you publish your comment? People can't avoid creating data. We do it passively. And you'll continue doing it, for your entire limited lifespan, even if you get neither laid nor paid for it.
To profit from creation you have to publish - what alternative do you propose?
So that I don't understand this idea that anything will "kill publishing". Copyright changes the economic math around publishing, sure - and most of the time currently not for the better. That will keep evolving but there is no risk of killing creation or publishing.
If anything, it will lower the barrier for creation, to allow a greater variety of art by a greater variety of artists. Adding a new tool to an artist’s kit is an incredible step forward for art.
People have been putting up with some theft because they could still eke a living.
This attitude has all the coherency of "some people are thieves, we cannot catch them all, so lets make theft legal".
Unless I hear some sensible argument why this slippery road won't destroy a good fraction of the economy I am assuming that regression to kleptocracy is the shape of things to come.
There are artists that can study a painting for a few minutes and then recreate it from memory. There are artists who study a particular body of work so long that they can create more works indistinguishable in style. If an artist recreates a copyrighted work or creates a derivative too close to the original, then that new work is potentially copyright infringement.
That is, we focus on the output of the process to determine infringement with living artists and ignore the training. But with ML, everyone focuses on the training.
It seems an ML tool could add a filter to the output and refuse to output a work that too closely resembles one or more work under copyright. Isn't that basically what legitimate professional artists do as well?
Thousands of artists are capable of infringement, but we don't take away their brushes based on capability.
Its legally different because the human brain is not considered a fixed medium under copyright law, so a human experiencing and learning something is not making what is potentially a copy or derivative work under copyright law, and therefore not exercising, potentially without permission, one of the exclusive rights granted to the copyright holder.
> There are artists that can study a painting for a few minutes and then recreate it from memory
Right, and those artists violate copyright at the moment they recreate it in a fixed medium without permission, but encoding it into computer storage media is already a copy, so a machine (or, rather, the person using the machine) hits that threshold before creating an actual visual output.
> That is, we focus on the output of the process to determine infringement
No, we focus on what is set in a fixed medium. If that is done at an intermediate step of the process, rather than being an output, it can still be infringement.
Its just that a human doesn’t always need to make a copy in what is legally a fixed medium until the output, but that is not the same as categorically only treating outout of a process as legally relevant.
Is that what these embeddings are doing? No.
If artists try to argue that all copies are equivalent, but are unable to recreate their works from the embeddings, their argument will fall flat.
This argument also only applies to sharing models, which is doubly dumb because we want open source models, not closed source models. It’s a harmful status quo to try and enforce.
I see no reason the same standard cannot be applied to ML generated content. If the evaluation is being performed on the end result, then that is all that matters. The same judges that decide these things for human generated content can continue to do so for ML generated ones.
Even the people submitting and responding to the copyright claims will still be human (with briefs generated by ML…).
What will be more interesting is when the judges themselves get replaced with an “objective” AI to quantify similarity for copyright purposes. If that ever happens, it’ll trigger an arms race to hit the razors edge without going over.
You just answered it? One is a ML system and one is a human?
I'm really, really baffled why people keep using this argument. Like you guys know machines are not humans, right? ...right?
Humans are special cases in laws. Always have been and always will be (until AGI). A pedestrian is treated differently in laws than a driver is. The fact that a pair of legs and a car both move you from point A to point B doesn't make them same. Selling human livers on your local market is very different from selling cow livers, even biologically they are all organic tissues.
Let me say it again: humans are special cases. AI learning copyrighted materials might be illegal or legal, but it has little to do with "what if a human being does the same".
If I independently come up with a song called "Let it Be" that has the same lyrics as the Beatles song and publish it without the permission of the copyright holders, I will have violated their copyright.
It doesn't matter if I heard the song before or not. It doesn't matter if I did it myself or used a computer to do it. What matters is the final product and my publishing of a song close enough to the one that was copyrighted.
AI image generators are just tools, like Photoshop is a tool. Nobody cares if you used a paint brush or Photoshop to create something that looks like a copyrighted image, why should AI image generators be any different?
If the final image is similar enough to a copyrighted work and I publish that image without permission of the copyright holder, then that's a copyright violation.
If the final image is different enough, then it's not.
That an AI was used and how the AI was trained are completely separate issues.
Probably not even then, at least not initially. While some people conflate AGI with personhood, consciousness, qualia, etc. we've got at least 22 different[0] ideas of what consciousness is and no idea how to even determine whether or not a mind has qualia — and even if we did, I see no specific reason to require any of them, as a P-zombie[1] AGI doesn't seem to me like a contradiction in terms.
IE why isn't it that an artist could say, hey I'm letting you see this painting, but you are not allowed to sit down with a canvas and learn how to reproduce it? Because you can do that in galleries - no photos, no reproductions.
So actually building a machine there, under the cover of darkness, that learns from your work so you can produce new work, why is that allowed in the first place? Certainly wouldn't be at a museum.
The key thing here is - if you want artists' data, you should ask for it. They didn't. This would be equivalent of training a Github CoPilot on every available piece of code in existence, ever, instead of what they had available. Why should that be allowed? So if I built some toy code in 1996, and happened to post it on usenet, and it's a great implementation of X, why the heck is CoPilot allowed to read it? It's my property.
But you can't stop people from sitting and studying your painting and then painting stuff similar to it.
One of the core assertions that is being decided in this case is if there is any actual reproduction here. Does a model contain a reproduction of every image it was trained on? Can the model actually create a reproduction of any images it was trained on?
If it turns out that there is no reproduction here, then it comes down to how much legal control we give copyright owners to regulate access.
A gallery can reasonably ban cameras and canvases, but it becomes a lot less reasonable if they try to ban artists.
Let's imagine that this isn't just specifically tuned ML but proper General AI that can learn new skills. Is your argument that this AI would be legally prohibited from viewing any images it doesn't have a specific license for?
I think that drawing hard lines around what kind of processing can be done on publicly available images is going to become problematic. It's better to regulate around what can be done with the results of the processing than that processing itself. That's how our existing laws work. Making a reproduction, even just from memory, of a copyrighted work is restricted. Memorizing a copyrighted work is not.
When you publish it, you lose some property rights. While under copyright, there is a short list of things that others are prohibited from doing (reproduce, distributed, etc.). And you lose all your rights once the copyright expires.
Because (fortunately) human thoughts can't be subject to copyright law yet. So when we talk about copying and making derivative works, if you have this
artistic works -> neural network weights
The end result may or may not be copyrightable (that's for the courts to decide), but this artistic works -> human brain
Definitively can't beThree things immediately spring to mind: scale (1), accountability (2), and profit (3).
1. An automated system can train on data at huge volume, in a way that no single human is capable of doing. Setting aside the issue that training an ML model and artists learning by copying techniques of other artists is, I would argue, fundamentally different acts, _even if we take them to be the same_, we have to acknowledge that in a single human lifetime one person can only "train" on so many works. Automated systems have no such limitation.
2. If an artist violates copyright or oversteps norms around artistic professional practice, they can be held accountable. Companies which violate this by using automated systems so far hide behind those systems ("the AI is doing it/did it") so aren't held responsible (it should be: the company has built the system, and therefore is responsible for how it is used, and what it does). By building up this false sense of agency on the part of systems (which the marketing term "AI" is designed to bolster), lack of accountability is laundered into the actions being taken at scale.
3. Automated systems are, due to their scale, very profitable. I can generate hundreds or thousands of copyright-violating work that dilute the market for artists, and it is incredibly cheap to do so. Fighting those copyright violations in court has to be done more or less on an individual basis (especially if actions like that in the original article continue to fail), which is extremely slow and expensive. If the cost of violating copyright is tiny, and the cost of enforcing it is huge, then it ceases to be a useful tool except for the most well-resourced organizations.
> It seems an ML tool could add a filter to the output and refuse to output a work that too closely resembles one or more work under copyright. Isn't that basically what legitimate professional artists do as well?
No, because copyright is more complicated than "these two things look a lot alike", and legitimate professional artists don't run into this issue, because they aren't constantly trying to skirt the line of "as close as possible to copyright violation while still getting away with it".
> Thousands of artists are capable of infringement, but we don't take away their brushes based on capability.
But they do get sued when they infringe! Enforcement happens, because (for now) it is still possible for independent artists to enforce their copyrights. The argument being made by artists with regard to these ML models is that _they are already infringing copyright_, not that they hypothetically may in the future.
All because judges will be befuddled about what to do after hearing terms like “training data” and “compression”. We will, of course get the emails in 10-20 years showing that it’s all lies and that the CEOs of these companies knew exactly what they were doing.
If this continues, AI will be the great inequality machine in history. Take data from 1,000,000 individuals, train your AI to replace them, compensate no one.
You can do this in every area: driving, farming, cooking. Music. Just dispossess everyone of all their property by training an AI to copy all their work! What could be easier (and less morally right)…
Is it bad that this will be used to displace individual artists/creatives in the value chain of media production? Of course it is. But we shouldn't be responding to that by clinging harder to schemes that have outlived their usefulness, we should be developing new models for funding production.
Great! So given your articles are in the public domain on your website I can make millions out of it without given you a cent or direct credit and sources without paying you and can claim it all as my own then.
The scenario where AI training is locked down doesn't result in 1,000,000 individuals getting paid. (What would they get paid, and by whom?) It results in Disney, Adobe, etc.—massive companies with existing licenses to use content just about however they want—training their own models and locking everyone else out of the large AI model training game, until AI gets good enough to start generating human-quality creative work on its own (the same kind of progression as alphago/lee to alphago/zero), perhaps with the addition of a small set of purely copyright-free material.
Excluding all copyrighted material would be tying an AI model's metaphorical hands behind its back, since humans, although capable of producing great works through much iterative effort in isolation, all rely on having learned from some copyrighted work. Find an author who hasn't read plenty of recent books as well as older classics, or a musician (other than classical) who hasn't listened to plenty of modern music, or a director or editor who hasn't watched tons of movies and films. Recall Newton, "[I]f I have seen further, it is by standing on the shoulders of giants." Many of those "shoulders" are copyrighted.
My understanding is that people want to be compensated when their intellectual property is used as training data for a machine. That strikes me as an entirely reasonable expectation.
One person memorizing Harry Potter for their own amusement, even if they make money doing public appearances where they recite sections of the work verbatim for the amusement of the audience is not in any way similar to the process of training an LLM or of that LLM's output. The scale alone is so vastly different that it renders the comparison useless and misleading.
They pay for it.
They buy the books. They buy tickets to theatre. They buy entrance to the gallery.
The trick that's being done now is hey, we don't have to pay since it's not a person. (to the creator) But hey, it is just like a person when it learns! (legal system)
If AI models require human training data, then they should pay for it. Easy.
That's what they're upset about.
If there were no use for 2D artists, then Stability Ai wouldn't be making an AI to replace them.
Key word here is: replace. 2D artists are not becoming obsolete - they're being replaced by a machine that was trained on their works without permission.
If you want to make an AI that does amazing paintings, and doesn't use human training data, more power to you. I can't compete with that. But if you use MY WORK to make a machine that's going to replace me, you do it under the cover of darkness and without permission - yeah i'll get pretty mad.
What happened to visual artists was more like Logitech announcing Logitech CoPilot and revealing they've extracted code from keylogging for the past 20 years.
I also take issue with the assumption that AI will "benefit everyone enormously, even if it won't be equally distributed"; I don't see any factual basis for this assumption. On the contrary, it seems much more likely that AI will be used to concentrate wealth even further. Given the high cost, I find it hard to believe it will ever by "equally distributed".
For as long as I can remember big corporations have been merciless in their preservation of "intellectual property". I didn't love it then and I don't love it now. OTOH, the idea that Microsoft can train their LLM on code I've written and then sell access to that LLM for money (sharing no dollars with me) strikes me as outright theft.
We've been here before several times: Silhouette painting, Photography, Airbrushing, Pianos, Synthesizers, Sampling, Photoshop, Ray Tracing, and many more. "It's not real art", "they're stealing from us", "we'll go hungry!" .
Some of these are already quite old. For others, I've actually been asked the question back when I was in school: "Are you really making music if your instrument has a microprocessor in it?". Um, yes, yes I claim I am making music thank you very much.
First people complain, then they adapt, and then they end up making awesome art with the new tools and/or instruments. Which isn't to say historically it was all rainbows and roses, but it was never the end of the world either. Seeing the newer generation of AI tools and how the tools end up getting integrated into regular workflows, it seems to be going the same direction.
To quote the song, I think it's "all just little bits of history repeating".
If you put your "work" out in the world, anyone who views it, is automatically training their brains on it. Viewing is training.
A perfectly reasonable view for humans, since you shouldn't be able to copyright a brain.
Not at all a reasonable view for a computer, until we also get to freely use all the copyrighted works ourselves. The problem here is that AI training is asymmetric: the people training an AI use works in violation of their licenses, but don't let their own works be used in the same way. For instance, Microsoft uses code on GitHub to train Copilot, but you still don't get to freely the source code of Windows or GitHub.
I am absolutely in favor of eliminating copyright and patent law. I am not in favor of keeping it around while letting AI become a laundering mechanism to get around it. AI training should not get to uniquely ignore copyright; copyright should cease to exist.
Same thing that happened on the construction of the "corporation as a person", built on top of rulings made to protect African Americans.
I'd really like to see people drop the inequality argument. If you actually cared about that instead of virtue signaling, you'd push for a mandatory GPL-style license that forces models to be available to anybody that uses them. That would avoid trying to unsuccessfully put the genie back in the bottle, while also preventing a few companies from benefiting at the expense of everyone else. Just like OpenAI's original mission of making AI available for everyone, before they got dollar signs in their eyes.
Good.
Intellectual Property is a mistake. If AI brings about its end, I welcome it.
Also, those emails seem very likely to be ordered to be produced during discovery.
This thing could really go either way at this point but I feel like Stability has the upper hand.
Imagine training a model without any of the plaintiffs images, then using that side by side with the model that does. This could then be used to show the jury that those individual works are of no importance to the system if the images are of the same quality.
They will probably argue that the individual expressions of each work are not copied, rather the abstract ideas of two-dimensional representations present across any and all images.
Expect lots of side by side pictures as Exhibits from both side! Grandma and her fellows have to weigh in on this one!
This is a fun one!
If the images are REALLY of no importance, they wouldn't have been used anyway.
The reality is that the law in 2023 US is so obscure and opaque and how judges come to their ruling seems to be by their total whim with no actual philosophy other than maintenance of the system.
Further I’m extremely unimpressed with the vast majority of judges competence in display - such that contempt should be the starting position.
The fact that this is how laws are actually made (precedent of applications will always beat the letter) means that nobody who doesn’t have a warchest will be able to actually utilize the system coherently
As with everything now, courts are rules by those with the most money
The law has never been more transparent. The public has nearly complete access to every docket in the country. Moreover, the level of jurisprudence has never been higher.
Moreover, I’ve lost a case or two in my time, but it was never because of a lack of a warchest.
I'm confused as to what this sentence means. The level of {the study/philosophy/science of law} has never been higher?
Hey quick question, my good friend, lets call him Doug is a high school equivalency graduate, has a few felonies and currently works as a road flagger
How does this complete access to every docket in the country help him?
You have more perfectly explained my point better than I could have. Thanks
Every piece of creation you and I make us the sum total of our experiences and that includes copy written work. Holding an LLM guilty for that is like holding the human brain guilty for memorizing copyrighted work.
"The other problem for plaintiffs is that it is simply not plausible that every Training Image used to train Stable Diffusion was copyrighted (as opposed to copyrightable), or that all DeviantArt users’ Output Images rely upon (theoretically) copyrighted Training Images, and therefore all Output images are derivative images"
This displays either ignorance as to how artists work and the extent to which they are involved or can be involved in the legal copyright system, or reflects incoherence around the copyright system.
In this case the Judge chose to say, in effect: unless you have explicitly copyrighted it, it's fair use
That is now a new precedent that negatively impacts individual artists who have no power in the market, and protects giant corporate interests which have tons of power in the market
Moreover, at least in the case of music, people have been successfully sued when their song strongly resembles another copyrighted work. Thus "holding the human brain guilty for memorizing copyrighted work" is actually the status quo.
One is how it works consistently with the current law. Ml model is basically a highly lossy compressed data format. If you collect millions of copyrighted images, merge them into a super big image, then compress it into a .jpg. Are you allowed to redistribute this .jpg file?
To me, it's mostly depending on how lossy (low quality) your .jpg is.
(Note the fact that human brains are also lossy compressed data is completely irrelevant here: you can only compare machine to machine, algorithm to algorithm. You can't say if a human has right to do X, therefore a machine has the same right to do X.)
But this line of thinking, while consistent to me, is dangerous. Because it means open models like Stability Diffusion are more likely to be illegal than a closed one like MidJourney, since it's closer to the source materials. If closed models end up being legal but open models don't, it would be a big loss for our society as a whole.
Compression implies the input can be reconstructed from the output (lossy or not), in the case of these ml models the input is the training data and the output is the model. You can't reconstruct even a fraction of that training data using the model alone therefore it is not compression even in the most lossy sense.
The model produced though can be an efficient compressor/decompressor, which produces a lossy output image when given a input of prompt and/or image.
All that aside, the whole human/machine thing is a dumb argument. It's humans that are using the tool. The question shouldn't be does a machine have rights to do X, but rather do humans the have right to use and build such tools?
It's already proven that you can reconstruct at least a small fraction of the training set from diffusion models. It's something quite well known, so could we not die on this hill?
[1] https://twitter.com/Eric_Wallace_/status/1620449942090420224 [2] The paper: https://arxiv.org/abs/2301.13188 [3] Relevant HN thread: https://news.ycombinator.com/item?id=34596187
Stable diffusion itself is just 6+ GB, and fits comfortably on my USB stick.
That's one heck of a lossy compression algorithm, sir!
(this thread has more discussion on this line of thinking https://news.ycombinator.com/item?id=37879938 )
Thanks for sharing this info which I'm aware of. However, this fact is not as significant as it might sound in terms of whether it's a lossy compression algorithm.
In most lossy compression algorithms, the compression rate is arbitrary. For example, for an algorithm that based on fourier transform, you can choose only take the first sin wave, or the first 1000 ones (a bit oversimplification here).
So yes, SD is small. Quite miracally small, and its size alone implies some important insights on how human see and read artworks. But this fact doesn't change whether I see it as a lossy compression. (In my previous comment I stated human brain stores lossy compressed data too, so you can see I'm using a broad definition of "lossy compression".)
What's similar enough to a pop music theme, that has a grand total of a few lines of unique music, to be a copyright violation? How many bars have to be copied, and what kinds of minor variances do or don't avoid a violation? If you're inspired by a haiku, and change 5 of 17 syllables, is that still a copyright violation? Who knows.
I believe that's why DALL-E bans some keywords related to alive artists. To show they have "no intention to violate copyrights".
And that's why I'm so worry about that we're heading to a future where open, uncensored models are illegal and closed source AI-as-a-service services are legal. It's not fearmongering: right now, you can't use GPL code in your closed source apps, but you can use GPL code on your server running a service that provides the exact same functions. I believe it has already hugely undermined the original intent of GPL (written in an era before SaaS became popular).
Some AI proponents say ML is the biggest invention since steam machines. I don't know if it's true, but if we end up stuck in a situation where open models are illegal while AI-as-a-service is legal, then it's the biggest step toward a dystopia since steam machines.
For all the legal issues — and the artistic flaws — I still find it quite remarkable how good it is at such a small size.
Here is a catch tho. It's just "by average" several bytes. We can't tell if some images practically contribute 0 bit to the final results while some others contribute more.
(I know this "contribute" word is a little non-sense in the context of ML. But existing lossy compression algorithms are not that different in this sense: if you compress a 1M frames produced by a 3D renderer to a .mpeg video, each frame doesn't contribute the same amount of bytes to the final result.)
I assume the entire client+server system constitutes the "machine" in this case, correct? So does "human with" refer to the end user (client side) or the sysadmin (server side)? Maybe one is an accomplice? The machine isn't going to infringe without certain prompting by the end user, just as an inkjet printer isn't going to do so.
Imagine you encounter a public domain image (which by definition is not protected by copyright), you download it, and put it on your website.
Perfectly fine.
But if you write "I made this image" below it, you are a liar and a fraud. No copyright needed.
But to answer the question, you use proprietary software protected by means other than government force every single day.
And can you imagine life without the wheel? Too bad we didn’t invent patents earlier so that we could’ve gotten a head start inventing it and the spear.
This lawsuit is even sillier than the previous ones.
What copyright actually protects is creative industry. By assigning individualized monopolies over copying and reproduction, the publishing industry can persistently lowball the shit out of artists (who themselves undervalue their work, see above) and then reap the profits for themselves. Since the vast majority of creative work would never see market interest, it's cheaper to pay billions of dollars to the handful of known, recognizable, and marketable mega-successes than to pay smaller amounts to a far larger pool of mid-list or unknown artists. This is why unions exist in basically every creative industry: otherwise, nobody below the talent line[0] gets paid.
To put a finer point on it: right now, the unions are doing a way better job of protecting human artists against AI art than copyright is. The argument for training AI being infringing is very weak in the general case where there's no obvious regurgitation. I mean, where does your copyrighted material even 'live' in the model, if the model can't even reproduce it? However, unions can very easily just say "you can't force us to cut corners by using this tool" in their negotiations and actually get that result. Furthermore, those rulings only bind publishers that hire artists. The artists themselves can still use AI when it makes sense in their workflow, rather than when publishers think they can cheap out on shit.
The failures of Soviet communism are complicated, but if you had to boil it down to one factor, I would not summarize it as "communal ownership bad" or "collectivism bad". Collective action has its place. Furthermore, the analogy you're making between copyright and physical property is flawed[1]. The reason why physical property ownership even exists is because of scarcity - the reason why I need permission to use your car is because you can't use your car if I'm also using it.
The irony of your communism analogy is that copyright is specifically used to erode ownership in private property in a way that makes the communism haters cry communism. There's a novel form of copyright misuse as a business model in which you put software in a thing that used to not require software, call it "smart", and then use the software to enforce your own idea of what "owning" the product means, backed up by the same laws that make it illegal to copy DVDs. There are a LOT of people who would like to go back to owning their cars and computers again, and that requires rolling back copyright, not strengthening it.
[0] Hollywood-ism for "people whose contribution to the work is not marketable"
[1] And, I suspect, a by-product of having read a bunch of Ayn Rand nonsense
Well, duh. The judge is helping out the plaintiffs in this case. A jury would have been easily convinced by the defense that no images produced by Stability's systems are visually derivative.
The key is indeed what follows:
The judge allowed Andersen to continue pursuing her key claim that Stability's alleged use of her work to train Stable Diffusion infringed her copyrights.
So unless there is some kind of summary judgement I would wager that this becomes the focus of both sides as this heads towards trial.
But that's it. As predicted by commentary from legal scholars, the outputs of Stable Diffusion are distinct from the model and are not infringing on copyright... at least for this complaint!
This simply means that they need to register their images for copyright before they can re-join the case. (https://www.gibsondunn.com/supreme-court-holds-that-copyrigh...)
EDIT: reading the linked PDF further, and it appears that McK and O's legal counsel stated that the two weren't asserting the copyright claims at all, which is why they were dismissed with prejudice. That means that they can't re-join the case by filing for copyrights for their images...Their lawyer fucked up pretty badly and if I were either of them I'd be filing a malpractice lawsuit.
You're entirely correct that if it was dismissed without prejudice the complaints on copyright infringement on the outputs could be amended and refiled.
Another interpretation is that the plaintiffs were well aware of how weak their case was with regards to the outputs and basically planned on abandoning it from the start.
There's been more than a bit of showmanship from the plaintiff's counsel so I'm not surprised that the actual legal tactics differ from the rhetoric of the blog posts. It's also common to stack the complaint so that when the judge does start focusing on the key issues that maybe a little more ends up at trial than otherwise.
There's winning in the court of public opinion and then there's winning in a Federal court.
In this case they never filed in the first place, and it was dismissed with prejudice.
I'm trying to pull up the original court document, but the PDF isnt loading.
Factual allegations at this point don't have to be correct (that's what discovery is for), but they do have to at least satisfy the legal requirements for each prong of a legal claim. In many legal pleadings, the plaintiffs will state, "upon information and belief, we [assert X factual allegation]" since they don't yet have the discovery to support a more specific factual allegation.
The DeviantArt direct claim is because of how DeviantArt has been using Stable Diffusion for their DreamUp system, but the direct claim against Midjourney has been less clear from the plaintiffs about whether they're going against Midjourney using Stable Diffusion in one model (beta/test/testp) or their use of training data (like LAION)
The other part of the case is if the artists copyright was violated when training the AI and they have only claimed that Stability used their art to train.
Just amazing!
The copyright infringement claim (for training) is left intact. It’s the other claims that had no basis in existing law (e.g. no copyright was registered, etc) that have been thrown out.
Can you elaborate in what way it disappeared in your opinion?
So this it pretty much expected and if it swings the other way the US/EU is going to hobble themselves in the face of any locality that gives zero shits about copyright. It's less about the art and more that the art enables these models to do real useful work and is better at it for having access to more data.
The dismissal of Deviant was inappropriate given that the case hasn't reached discovery yet. The dismissal was granted based on a substantive evaluation of the Defendant's assertions which is inappropriate at this early procedural stage of the case. (see e.g. page 10 where the judge evaluates the "plausibility" of alleged facts, and page 12 where he says "I am not convinced" about the plaintiff's theory, even though in a MTD this is not a determination he is supposed to make pre-discovery).
Moreover, even if plaintiff's language was "unclear", the appropriate procedure is to require them to amend their claim and dismiss Deviant if the plaintiff does not amend, not to dismiss a defendant and give the plaintiff leave to amend their claims.
With respect to Midjourney, the Plaintiffs failed to plead sufficient factual allegations to support their claim, so that dismissal was appropriate. (Pre-discovery, it's okay for the alleged/pleaded "facts" to be wrong, you just need to allege sufficient "facts" that you have a legal basis for a court case. Note that "facts" in the MTD context doesn't mean real world facts, it is a legal term of art that actually refers to an allegation of a fact that will later be determined to be true or false at the actual legal proceeding on the merits.)
I'm not sure Stability will agree to a licensing fee, since part of the rationale for the last version of SD was to remove the infringing images from their training sets going forward.
The lawsuit is moving forward, but only on copyrighted work. This is (not yet) a story.
https://www.copyright.gov/help/faq/faq-general.html
(Makes me wonder if, back in the day, every song that was downloaded and then pursued by the RIAA was registered...)
Imagine you have a blob of seemingly random data. Nothing in the data contains anything recognizable as illegal or in violation of copyright.
Now imagine that the right input suddenly turns the data into illegal or infringing material, after a transformation operation. And not just a single unique input such as a password which clearly represents a mapping function between two sets of data.
But imagine if there were seemingly infinite possible inputs, each of which transformed the data into a different infringing blob of data. If these inputs exactly represented the novel, copyrightable or illegal aspects, but the blob itself was inert.
What should be illegal here? The blob, which by itself is free of any questionable bits of data, or the inputs which transform it into something tangible? Both? Neither?
Well, it has never been illegal to draw or paint something representing CSAM, for example. And it has never been illegal to draw or paint Mickey Mouse in your own home.
What's often illegal is publishing said data. Ignoring the free speech debate around artificially produced CSAM, publishing it is already illegal in many territories. It is also illegal to violate copyright in many countries when publishing information.
What's interesting is that it is not illegal to trace a drawing and hanging it up on your wall, instead of buying the the real drawing from its rights-holder. It's also not illegal to reproduce a tracing done by a friend. But the recording and film industries have been more successful in convincing us that it should be illegal to do the same for a song or film. That you should not be able to "trace" the data at home, and that you should not be able to share it with me, that I should not be able to trace over your tracing and bring home a copy for myself.
I can understand, and support a copyright system which regulates the publishing of copyrighted material. Even copyleft paradigms lean on regulation for enforcement. But the film and music industry actively try to restrict individual freedoms in the name of corporate profits, while still screwing over their clients and employees with respect to profit-sharing.
Back to the point: That blob should never be illegal. The activation functions should never be illegal. That is a basic extension of free speech. But publishing, that is a different story, and we already have laws offering such protections both with respect to illegally-produced or copyrighted content. Any attempt to regulate what kind of model I am allowed to run at home is a massive infringement on my rights as an individual, and is borne either out of gross ignorance of current copyright law from the same people crying, "But think of the copyrights!", or direct, insidious corporate greed.
You can adjust this thought experiment so that instead of dealing with a magic blob, we are dealing with a program that makes it really easy to produce illegal or copyrighted works after a bit of human interaction. Is there claim here now? Are we basing the law on how much human involvement was needed to create the output? We've faced similar arguments around technological leaps such as the printing press or mechanical loom. Did we, as a society, reject these advances in technology in order to protect loom workers and scribes?
Bottom line. You can pry my models out of my cold, dead or handcuffed hands. Times like these really shine a light on who is complicit in the system, and who suffers from it.
If you are in the creative industry, you need to understand how things are going to change. As an engineer with decades of investment into my craft, I also have to face the rude awakening that is ahead in my own industry as automation creates a gap between highly-skilled professionals and newcomers. Being a paid software engineer might become as hard of work as becoming a famous professional artist. Lots of connections, insane specialization and a lifetime devoted to the craft. A lot of people in school for engineering right now might struggle to find employment in 20 years or less if they cannot cross this gap in time. Artists aren't the only tribe experiencing a huge industry shake-up over a technology that will one day be so ubiquitous that it's inside of your toaster.
R
The set of Real numbers contains every positive whole number. This is already the magical blob.
Eg the decimal number 65101114114111111110 is "Aerroon" in ASCII.
Edit3: real numbers are better than natural numbers or whole numbers for this. They have zero and they solve the "0005" problem.
Until this point, an artist who has developed their own personal, recognizable style, could be somewhat confident that it is difficult for someone else to generate a new piece of art exactly mimicking their style. That is to say, it was never impossible — there have certainly always been other artists out there who are capable of taking artwork and creative something new in that style — but there were some barriers to getting there, including that those artists aren’t easily and instantaneously accessible to every human being on the planet, that they generally don’t work for free, and that they would need some time to produce their work. The combination of these factors resulted in a system wherein, for the most part, if you really wanted to create something in the style of a specific artist, you would need to commission them, thereby supporting their ability to live and continue creating art. And/or they sold merchandise with their art, or collections, etc.
Now, on the other hand, it is incredibly easy to go to an image generator and have it generate art in the style of a specific (sufficiently well-established) artist quickly, easily, and freely. The barriers have, overnight, gone from being reasonably protective to pretty much nonexistent. As a result, artists are asking themselves how they can continue to live and create art. This is something a sufficiently well-established professional artist used to be able to do before generative AI came into the picture, because other than the odd copycat (which again took time and effort and an actual human with the right ability), they were the only ones who could produce images in their own styles, and this ability was thus a valuable resource that people paid for. If anyone can now produce identical images independently and for free, then this ability may no longer be a resource other people will pay for.
Part of what these court cases are trying to determine is exactly whether any copyright does apply to generated images. You wrote that “publishing, that is a different story, and we already have laws offering such protections both with respect to illegally-produced or copyrighted content”, but those laws are exactly what’s being tested here: artists (and organizations like Getty) are seeing what they claim are AI-generated copies of their copyrighted works in use out in the world (so these have been “published” by some definition — they are not only being printed out and hung in people’s garages for them and their friends to look at in private), and are suing to stop that.
But aside from that, I think there is a real philosophical discussion here. If you’ve trained as an artist your entire life, have worked hard to develop a unique style, and are one of the relatively few artists who have been successful doing so — should a company be able to wait until you became popular, then just take all of your work, and use it to train a model that can produce works exactly in your style easily and without any effort, which it can then provide to people freely or for a subscription?
This also isn’t as much about the output, as about how the output was obtained. If the model did not actually ingest your images, but someone wrote a prompt that involved a super-detailed description of what made your style unique, going into color palettes, line thicknesses, art styles, influences, etc etc, and you would have to get all of that right in order to generate something that looked like your art, then I think most folks would be generally ok with that. But when (1) your prompt can just be “give me art that looks like soulofmischief made it” and it’ll give you just that, and (2) you know that your art was used to train the model in order for it to be able to do that, then there is a question of whether fair use laws should be adjusted to prohibit this behavior and protect your ability to live off of your work.
I also think that regardless of the outcome of these lawsuits, no one is really coming for your own models and hour ability to tinker in your garage. It may not be legal today to duplicate a copyrighted image and hang it in your office, but no one will ever know (or care enough to do nothing about it) if you do. Similarly, even if this use becomes copyrighted, nothing will practically stop you from building your own large model that includes any copyrighted images you want, for your own personal use, in your own garage. But if you then turn around and try to profit off of that model, or if you want someone else to produce a model (thus stepping more into the publishing realm) that’s where a line may be drawn. I personally think that’d be fair.
Finally, zooming all the way out, I believe that it should be possible to make a living as an artist, and I think when we have discussions like these, we should keep reminding ourselves to think about how our technical or legal arguments affect that outcome.
I’m impressed that their legal team was incompetent enough that they didn’t bring this up as an issue before filing the lawsuit.
After GDPR and the cookie pop-ups my expectations for things coming out of the EU is quite low. Every company I have worked it has a different and often conflicting interpretation of GDPR, and some places uses it to play politics, and governments of individual EU countries are not doing their part to clarify how things should be interpreted. It's a dumpster fire IMO.
"In his dismissal of infringement claims, Orrick wrote that plaintiffs’ theory is “unclear” as to whether there are copies of training images stored in Stable Diffusion that are utilized by DeviantArt and Midjourney. He pointed to the defense’s arguments that it’s impossible for billions of images “to be compressed into an active program,” like Stable Diffusion."
Perhaps future litigation will be more successful if they treat the model as a black box. Could an argument be made that a person's intellectual property was used to train the model without compensation and _that_ is the illegal act? From there one would only have to demonstrate that the output form the model is similar to a person's body of work.
What? I thought everything was copyrighted by default under the Berne Convention?
That's the reason for the existence of CC0 [0], after all. Their FAQ says: "Copyright and other laws throughout the world automatically extend copyright protection to works of authorship and databases, whether the author or creator wants those rights or not."
[0] https://wiki.creativecommons.org/wiki/CC0_FAQ#What_is_CC0.3F
The HN discussion back when this lawsuit was first announced was correctly pessimistic: the top comment was "Where are the copies?". https://news.ycombinator.com/item?id=34377910
"The contentious issue of whether AI art generators violent copyright — since they are by and large trained on human artists’ work, in many cases without their direct affirmative consent, compensation, or even knowledge — has taken a step forward to being settled in the U.S. today."
Is it human-generated? Violent copyright?
That is the key.
Not only that but I really wish we could just redo copyright to be more flexible but ultimately empowering the creator with conclusive licenses for others to use (like in AI, other creative work, streaming, etc.) and the creator is paid either monthly or per generated image/song.