- Automatic1111 outpainting works well but you need to enable the outpainting script. I would recommend Outpainting MK2. What the author did was just resize with fill which doesn't do any diffusion on the outpainted sections.
- There are much better resizing workflows, at a minumum I would recommend using the "SD Upscale Script". However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control. In this case "SD Upscale" is fine but the inpaint based upscale works well with complex compositions.
- When training I would typically recommend to keep the background. This allows for a more versitile finetuned model.
- You can get a lot more control of final output by using ControlNet. This is especially great if you have illustration skills. But it is also great to generate varitions in a different style but keep the composition and details. In this case you could have taken a portrait photo of the subject and used ControlNet to adjust the style (without and finetuning required).Diffuse an 8k image? Isn't it going to take much, much more VRAM tho?
https://i.imgur.com/zOMarKc.jpg
Here's an example using various techniques I've gathered from those 4chan threads. (yes I know it's 4chan but just ignore the idiots and ask for catboxes, you'll learn much faster than anywhere else, at least that was the case for me after exhausting the resources on github/reddit/various discords)
You are upsampling, then inpainting sections that need it. So if you took your 8K and inpainted a section with 1024x1024 that works well with normal ram usages. In Auto1111, you need to select “inpainted masked area” to do that.
https://github.com/zero01101/openOutpaint
https://github.com/BlinkDL/Hua
Both use automatic1111 API for the work.
There's a great deal of pushback against AI art from the wider online art community at the moment, a lot of which is motivated by a sense of unfairness: if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?
(I do not share this opinion myself, but it's something I've seen a lot)
This is another great counter-example showing how much work it takes to get the best, deliberate results out of these tools.
This is not something I've seen once in any sort of criticism of "AI art", and elsewhere in the internet I'm largely in a anti-ai-art bubble.
Most legitimate pushback I've seen has been more on the non-consensual training of models. Many artists don't want their work to be sucked up into the "AI Borg Model" and then regurgitated by someone else, removing the artists consent, credit, and compensation.
Using stock art is just further appropriation, which is silly considering the intent and licensing of stock artwork is clearly intended by all parties to turn works into commodities for commercial exploitation.
The old ways are best, the new ways are bad and take away the soul from the creation process and resulting works. Also unconvincing considering that most of the people saying that are using radically different, digitized, heavily time optimized art workflows compared to norm of the industry even 30 years ago.
Not that I don't see the problems, the potential for job losses due to the optimizations to workflow requiring less work and therefore less workers is an actual risk, but one that happens regardless of copyright enforcement of AI models. The problems commercialized AI art workflows cause may even be exacerbated by enforcement of copyright on training data by handing a monopoly of all higher quality generative AI models into already entrenched multinational intellectual property rightsholders hands. I think a lot of artists forget that copyright isn't as much for them as it's for the Disneys of the world.
In order to copy style from other artists reliably - you have to make a LoRA yourself. That involves a lot of manual work and it can't really be automated if you want good results.
Artists can opt out of future SD base models (which doesn't matter), but they can't opt out of someone making a LoRA of their work (which actually works).
I've actually seen this a lot.
In my view, it's not coming from professional artists working in the field. Their concern is more that people are ripping off their style, or that AI is making their efforts unnecessary (e.g. lots of people who made a living by copying the style of particular anime & cartoons for fans, no longer have a purpose since AI can do that given enough source material).
Non-professional artists, on the other hand, are still learning and have put a lot of time into their craft and it hasn't paid off yet. They seem to be annoyed that other people are getting results (via AI), without actually having to learn the mechanics of art.
AI basically lets your generic art history major produce lots and lots of pieces, because they can describe artwork well enough and know where to find good samples for the AI. The only thing stopping them was mere mechanical inability, not knowledge of the art space.
Is this part actually coming from artists? What’s the suggested amount(be it upper quadrillion dollars per second or $0.25/use)?
I think compensation as a condition is only assumed implied, that financial gains are artists’ motives and they actually live off that income. Rather, I see a lot of vocal oppositions to AI image generators that aren’t drawing for profit at all.
So, is the money going to solve it, or is it a wrong assumption, or is it that it will have to be settled by lump sums?
Look at the pushback to Adobe’s model.
“Non consent of model input” is just a tool they’re using in the hopes of destroying the tech. Plenty of companies have datasets of these same people’s work where the T&C permits training.
The narrative will switch once you can no longer use the “stealing/consent” argument. They won’t suddenly become fine with this tech just because the dataset consented.
Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.
Even if they did have a more complex workflow most of them are still based on copyrighted training data, so there will be many lawsuits.
Then why don’t they illustrate it instead, and save themselves some time?
Indeed. Another criticism that I can definitely somewhat see the idea behind, is that the barrier to entry is very different from for example drawing. To draw, you need a pen and a paper, and you can basically start. To start with Stable Diffusion et al, you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware. One way or another, if you want to practice AI generated art, you need more money than what a pen and paper cost.
I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.
That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak. I really dont see what's difficult with that case. I think the internet assume a bit to quickly it's a difficult question and a grey area when maybe it just isnt.
It's noteworthy that Adobe did things differently than the others and the way they did things goes in the direction im describing here. Maybe it's just confirmation bias.
> That’s because you can’t find an artist for a generated picture other than the ones in the training set.
First, that’s clearly not true when you are using ControlNet with the input being human generated, or even img2img with a human generated image, but second and more importantly…
> If you can’t find a new artist, then the picture belongs to the old ones, so to speak.
That’s not how copyright law works. The clearest example (not particularly germane to the computer generation case, but clearly illustrative of the fact that “can’t find another artist” is far from dispositive) is Fair Use noncommercial timeshifting of an existing work: it is extremely clear there is no artist but that of the original work, and yet it is not copyright infringement.
> I really dont see what’s difficult with that case.
You’ve basically invented a rule of thumb out of thin air, and observed that it would not be a difficult case if your rule of thumb was how copyright law works.
Your observation seems correct to that extent, the problem is that it has nothing to do with copyright law.
> I think the internet assume a bit to quickly it’s a difficult question and a grey area when maybe it just isn’t.
IP law experts have said that the Fair Use argument is hard to resolve.
Assuming the lawsuits currently ongoing aren’t settled, we’ll know when they are resolved what the answer is.
“you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak”
I don’t think that’s valid on its own as a way to completely discount considering how directly it’s using the data. As an extreme example, what if I averaged all the colours in the training data together and used the resulting colour as the seed for some randomly generated fractal or something? You could apply the same arguments - there is no artist except the original ones in the training set - and yet I don’t think any reasonable person would say that the result obviously belongs to every single copyright owner from the training set
Normally - outside the specific context of AI generated art -, there is not a relation "work¹ → past author" , but "work → large amount of past experience". (¹"work": in the sense of product, output etc.)
If the generative AI is badly programmed, it will copy the style of Smith. If properly programmed, it will "take into account" the style of Smith. There is a difference between learning and copying. Your tool can copy - if you do it properly, it can learn.
All artists work in a way "post consideration of a finite number of past artists in their training set".
It doesn't belong to the "old ones", it is at best a derivative work. And even writing a prompt, as trivial as it might seem, makes you an artist. There are modern artists exposing a random shit as art, and you may or may not like it, but they are legally artists, and it is their work.
The question is about fair use. That is, are you allowed to use pictures in the dataset without permission. It is a tricky question. On one extreme, you won't be able to do anything withing infringing some kind of copyright. Used the same color as I did? I will sue you. On the other extreme, you essentially abolish intellectual property. Copying another artist style in your own work is usually fair use, and that's essentially what generative AI do, so I guess that's how it will go, but it will most likely depends on how judges and legislators see the thing, and different countries probably will have different ideas.
We have some countries where it is explicitly legal to train AI models on copyrighted data without consent, and precedent in the US that makes this a plausible outcome there as well.
Could you explain what portion of copyright law you believe would cover this argument? I'm not a lawyer, but have a passing familiarity with US copyright law, and in it, at least, I do not know of anything that would support the idea you're proposing here. How would you even assign copyright to the "old" artists? How are you going to determine what percentage of any given generation was influenced by artists X, Y, Z?
Agreed. An AI model trained on an artist's work without permission is IP infringement and this should be widely understood. Unfortunately, because the technology is new people do not understand this. When Photoshop was new, there was a similar misunderstanding. People could take an artist's work, run it through Photoshop, and then not compensate the artist. It took some time for that to sort out.
It’s a static mapping, surely it should be possible, you’d think, but NN frameworks aren’t designed that way. That is blocking it from happening(and also allowing “AI is just learning, human is same” fallacy)
Nonetheless, it would be odd and a weak argument to point criticism towards not spending adequate «time and effort» (as if it made sense to renounce tools and work through unnecessary fatigue and wasting time). More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".
This said, I'd say that Stable Diffusion is a milestone of a tool, incredible to have (though difficult to control). I'd also say that the results of the latest Midjourney (though quite resistant to control) are at "speechless" level. (Noting in case some had not yet checked.)
I don't get this. If one "can produce pleasing graphics," how does that not equal knowing what they're doing? I only see this as being true in the sense of "Sure, you can get places quickly in a car, but you don't really know how it works."
This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall. There’s probably a market for it, but I get the strong impression it’s the “live, laugh, love” market. The people that buy pictures for their wall in the supermarket. The kind of people who pay individual artists money to paint bespoke images of their pet are not going to frame AI art. I don’t think the artists need to worry.
I’ve done pictures of my wife in the style of other photographers, Soviet-style propaganda posters, 50s pinups, Alphonse Mucha, and much more.
I’m a professional photographer and have tons of great pictures of our dog - the kind of stuff people pay for. My wife’s lock screen on her phone is something I generated instead.
In-ai-aided art, like manually developed film, will trend towards a niche.
Well yeah but that doesn't change the OP commenter's point that it takes a lot of work to get high quality art still.
> I don’t think the artists need to worry.
I disagree here but only on the basis of what type of art it is. Stock art/photography, and a lot of media designwork is likely at risk because we can now create "good enough" art at the click of a button for almost no cost. I agree that the "hang on the wall level good" artists aren't at risk just yet, but between the more filler-art and the uh
Well "anime/furry" commissioners are definitely at risk right now for anything except the highest quality artists, and there is a MASSIVE community behind this - in fact they have done a lot of the innovation for StableDiffusion including optimizations/A1111 webui, and have trained many custom models for their art, already had pretagged datasets of 10k's of images....
It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.
I'm under the impression that DALL-E itself used licensed data as well.
I find some people are comfortable with that, but others will switch to different concerns - which indicates to me that they're actually more offended by the idea of AI-generated art than the specific implementation details of how it was trained.
There is an irony, however, that many of the AI art haters tend to draw fanart of IP they don't own. And if Fair Use protections are weakened, their livelihood would be hurt far more than those of AI artists.
The Copilot case/lawsuit IMO is stronger because the associated code output is a) provably verbatim and b) often has explicit licensing and therefore intent on its usage.
Its kinda like using ffmpeg for vapoursynth for video editing instead of a video editing GUI.
That being said the training parameter/data tuning is definitely an art, as is the prompting.
I turned my dog into a robot awhile back using the img2img feature of Stable Diffusion and the results were pretty amazing![1]
Say you generate a picture with midjourney - who is/are the closest artist(s) you can find for that picture?
Not the AI, not the prompter, so the closest artists you can find for that picture are the ones who made the pictures in the training set. So generating a picture is outright copyright infringement. Nothing to do with unfairness in the sense of "artists get out compete". Artists dont get out compete - they are stolen.
Why is it you discount the creative input of the user? Are they not doing work by guiding the agent? Don’t their choices of prompt, input image, and the refinement of subsequent generated images represent a creative process?
I previously made coloring pages for my daughter of our dog as an astronaut, wild west sheriff, etc. They're the first pages she ever "colored," which was pretty special for us. Currently I'm working on making her into every type of Pokemon, just for fun.
StableTuner to fine tune the model - I can't recall the name of the model I trained on top of, but it was one of the top "broad" 1.5 based models on Civitai. Automatic1111 to do the actual generating. I used an anime line art LoRA (at a low weight) along with an offset noise LoRA for the coloring book pages as otherwise SD makes images be perfectly exposed. For something like that you obviously want a lot more white than black.
EveryDream2 would be another good tuning solution. Unfortunately that end of things is far from easy. There are a lot of parameters to change and it's all a bit of a mess. I had an almost impossible time doing it with pictures of my niece, my wife is hit or miss, her sister worked really well for some reason, and our dog was also pretty easy.
There's a huge difference in performance (generating an image takes 10 minutes rather than 10 seconds and training a model would take forever) but with some Python knowledge and a lot of patience it can be done.
Apple's Intel Macbooks are infamous for their insufficient cooling design for the CPUs they chose, which won't help maintaining a high clock speed for extended durations of time; you may want to find a way to help cool the laptop down to give the chip a chance to boost more, and to prevent prolonged high temperatures from wearing down the hardware quicker.
Seems like lots of work went into that and I hope the author enjoyed the process and enjoys the final result.
It wasn't actually particularly hard - I used a Colab notebook on the free tier to fine-tune the model, and even got chatGPT to write some of the prompts.
Here's the colab notebook, in case anyone is interested: https://github.com/TheLastBen/fast-stable-diffusion
I've trained a few smaller models using their Dreambooth notebook, but I think for 4000 training steps, an A100 will usually take 30-40min. I believe replicate also uses A100s for their dreambooth training jobs.
I've found the most important part is spending a good amount of time getting the prompts, although I'm not sure if having the person in an environment embodied and describing the objects around them helps give the model a "sense of scale"? For example if I just train "wincy" in the fast Dreambooth "wincy" will be the only token it'll know, with no other info in the prompts, it didn't know what in the image was "wincy" (me). I accidentally did this on training my wife (no prompts at all) and she got really mad at me at how ugly the results were (you made me ugly! haha)
Have you tried it with and without your dog in an environment, then describing the environment your dog is in for the training data?
dreamlook.ai
Upload your pictures, we train the model in a few minutes, then you can download your trained checkpoint. $1/model, first one for free.
For app builders, we provide a solid API that scales to 1000s of runs per day without breaking a sweat.
People have been sending me the cute pics the AI generates of their pups. I think this is arguably the best thing so far in this latest wave of AI releases!
This would have been much better standalone.
https://fortune.com/2023/02/23/no-copyright-images-made-ai-a...
From this point of view, Adobe Firefly is obviously ahead:
"The current Firefly generative AI model is trained on a dataset of Adobe Stock, along with openly licensed work and public domain content where copyright has expired.
As Firefly evolves, Adobe is exploring ways for creators to be able to train the machine learning model with their own assets, so they can generate content that matches their unique style, branding, and design language without the influence of other creators’ content. Adobe will continue to listen to and work with the creative community to address future developments to the Firefly training models."
So the only way forward to have an ownership of your product is to train your own models over your own data.
Humans are much worse in telling dogs apart than other humans (except perhaps the owner of the particular dog).
So for all we know, the AI didn't generate a portrait of this particular dog but instead a generic picture of this breed of dog.
But this is totally true, I found that maybe 30% of the images I generated did not look like my dog at all. However the rest do a good job at capturing his eyes and facial expressions that he actually makes. I thought that the chosen image I worked from captured the look of his eyes super well.
But yeah, nobody but me would really appreciate that.
My point is that it is difficult to judge (for us) that the returned photos are actually similar to the subject.
She’s pretty unique looking and it comes through even with heavy styling.
Tomorrow's release should contain both mask blur and inpainting ControlNet, which might help these use cases.
I did the actual work back on March 11th, so I was likely on an older build; but I was seeing issues where inpainting was just replacing my selection/mask with a white background. I had the inpainting model loaded, but couldn't figure it out.
I'm planning to continue playing with Draw Things locally, and exploring the inpainting stuff. For such an iterative process I feel like a local client would make for the best experience.
That has been said, you probably used paintbrush rather than the eraser? There would be more help on the Discord server though! https://discord.gg/5gcBeGU58f
I mean you cannot outpaint in the img2img tab, load the image in the inpaint tab and possibly use the inpainting model.
People can debate if it’s actually good that people can create art without being artists, but again, I think it’s great that the author had the freedom to create what they had in mind without much outside influence. This has been a goal for computers in general for so long, and it seems like we’re actually arriving with some mediums.
Glad to see I’m not alone on this. I think the end result would have turned out much better if the author had simply adhered to the Huichol art palette, which I’m convinced they were aiming for at the beginning. That color scheme works for a reason.
Big props to folks at replicate.Com for making solid infrastructure for ml.
The app also has a rest endpoint for anybody to create app using it. Lot of clients create niche websites catering to different use cases. There is a kind of gold rush going on in this area.
AUTOMATIC1111
https://github.com/AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion
https://github.com/Stability-AI/StableDiffusion
?
Yes, They're Real and They're Spectacular.
For example:
- There's one that creates gaming avatars based on your picture - There's one that makes professional headshots for your CV
Even Microsoft's own product (Microsoft Designer) to use AI to create posters & flyers is most useful when you start with an image of your own creation then use the AI to change the style of the image, or integrate it into a template that it dreams up.
Any references where the same has been tried on humans ?
https://www.bing.com/images/create/a-bright-day-on-the-stree...
https://www.bing.com/images/create/christmas-on-board-a-spac...
https://www.bing.com/images/create/in-the-streets-of-atlanti...
EDIT: four downvotes and zero answers how to run it on a Linux machine…
as well as, doing all that nn/ml stuff, instead of just, trying to learn a bit of how to make an artwork themselves, how to draw something, even by tracing over a photo, like doing a 'how to do a vector colorful painting dog' search and going off on that.
like, this end result doesn't even look far off from what a 'colorful vector dog portrait' tutorial would yield. it just involves tons and tons of questionably sourced artwork, and violated copyrights. (i know techbros are very confused about copyrights, but stuff like licenses and copyrights actually do have their meanings, limitations, and liabilities)
specifically picking stablediffusion, probably the most blatantly stolen artwork-based model (given how open and clear it is with what data has been used for it, and how you can't just squirm 'i didn't know what were the terms of use of their data' with other, more closed-off services), that's just another great touch as well.
Hiring an artist removes control, it's not your art, but the artist's what ends up on the wall. That's the reason that I avoid hiring artists when I want some art made.
With that said, I have found that at least for now using AI for art is just too much work and too little control. I want my art on the wall, not whatever the AI model outputs. So, I'm in your camp of "just learn to paint the darn stuff". I find it's a lot more fun.
I find the complaint about copyright so strange in this case. Copyright has a purpose, stopping some random person from creating an imagine only they will see and use is not that purpose. In this case it's just spiteful. Ultimately, if you think he's infringing your copyright you should sue him, but I don't think you'd win.
there's a great alternative to "not paying/refusing to pay" (but using and stealing stuff anyway) - just, not using other people's stuff. not using stuff that's built on copyright/license violations. not using artwork you don't own, that you don't have rights, licenses, or permissions to use. (yes, simply 'taking something and making a model from it', would be a violation.) one could just not do a shitty thing, and they wouldn't have to jump hoops to find any justification for the shitty thing they did.
they could do a step-by-step art tutorial, and wouldn't have to pay anybody, nor use tools that rely on stolen artwork. but nope.
highly ironic how they made this thing, and promptly showed it off to thousands of people on the internet, immediately invalidating your example
they also promote (just by choosing and mentioning all of these things) those services, like Replicate, that monetize the use of stolen artwork (by selling compute, directly coupled with nn models), and ultimately profit from it (solely, without "giving back to artists whose art they perused" or anything).
they could make art in a way that wouldn't participate in tech art theft racket, but they didn't. and they didn't just participate in it, but promote it and perpetuate it.
This sort of image generation (especially extrapolating probable upgrades & improvements of just the next few years) displaces artists without providing an upgrade path. It shouldn't take much empathy to understand how that's frustrating and scary that must be for them.
I think pxoe's expression of frustration is totally reasonable. The engineers who made this stuff could have focused on using AI to enable new possibilities, instead of undercutting existing possibilities (to create new markets instead of overtaking existing ones). They could have used 100% consensual training data, but instead felt entitled to exploit a loophole/ambiguity in the social contract under which artists have been sharing their work on the internet.
A more appropriate analogy would be, this sounds a lot like social movements complaining about capitalist co-opting of their symbols, e.g. the use of "communist" rhetoric to build oligarchies (or the sale of "save the earth" mugs made from oil-derived plastics, etc.). Even that isn't a perfect analogy, though, as the co-opted output wasn't itself the displaced work-product, although it does a better job of capturing the emotional side of it, the sense of betrayal. Ultimately, I don't think it's productive to reduce the reaction to the decisions behind Stable Diffusion etc. to a single analogy, and it shouldn't be so hard to say "sorry, you're right, this is bad for you and good for me and you have every right to express your frustration over that irreversible decision".