I used Stable Diffusion and Dreambooth to create an art portrait of my dog (opens in new tab)

(shruggingface.com)

663 pointsjakedahn3y ago236 comments

236 comments

Some feedback on workflow:

  - Automatic1111 outpainting works well but you need to enable the outpainting script. I would recommend Outpainting MK2. What the author did was just resize with fill which doesn't do any diffusion on the outpainted sections.
  - There are much better resizing workflows, at a minumum I would recommend using the "SD Upscale Script". However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control. In this case "SD Upscale" is fine but the inpaint based upscale works well with complex compositions.
  - When training I would typically recommend to keep the background. This allows for a more versitile finetuned model.
  - You can get a lot more control of final output by using ControlNet. This is especially great if you have illustration skills. But it is also great to generate varitions in a different style but keep the composition and details. In this case you could have taken a portrait photo of the subject and used ControlNet to adjust the style (without and finetuning required).

raincole3y ago

> However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control.

Diffuse an 8k image? Isn't it going to take much, much more VRAM tho?

thot_experiment3y ago

For what it's worth if you actually want to get help on the state of the art on this stuff the best place to ask is the 4chan /g/ /sdg/ threads, and you can absolutely diffuse images that large using TiledVAE and Mixture of Diffusers or Multidiffusion, both of which are part of the Tiled Diffusion plugin for auto1111.

https://i.imgur.com/zOMarKc.jpg

Here's an example using various techniques I've gathered from those 4chan threads. (yes I know it's 4chan but just ignore the idiots and ask for catboxes, you'll learn much faster than anywhere else, at least that was the case for me after exhausting the resources on github/reddit/various discords)

3 more replies

SV_BubbleTime3y ago

That confused me at first too.You aren’t diffusing the 8k image.

You are upsampling, then inpainting sections that need it. So if you took your 8K and inpainted a section with 1024x1024 that works well with normal ram usages. In Auto1111, you need to select “inpainted masked area” to do that.

1 more reply

asynchronous3y ago

To clarify when things are upscaled like that they typically mean a section of img2img in a grid pattern to make up the full picture so it doesn’t overuse vram.

smusamashah3y ago

For outpainting there are these two amazing tools which give you a canvas to do stuff

https://github.com/zero01101/openOutpaint

https://github.com/BlinkDL/Hua

Both use automatic1111 API for the work.

jakedahnOP3y ago

Thank you for these recommendations! I'll definitely be trying them next time 'round.

pjgalbraith3y ago

Good luck! I have some workflow videos on Youtube https://youtube.com/pjgalbraith. But I haven't had a chance to show off all the latest techniques yet.

simonw3y ago

I love how much work went into this.

There's a great deal of pushback against AI art from the wider online art community at the moment, a lot of which is motivated by a sense of unfairness: if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?

(I do not share this opinion myself, but it's something I've seen a lot)

This is another great counter-example showing how much work it takes to get the best, deliberate results out of these tools.

madeofpalk3y ago

> a lot of which is motivated by a sense of unfairness

This is not something I've seen once in any sort of criticism of "AI art", and elsewhere in the internet I'm largely in a anti-ai-art bubble.

Most legitimate pushback I've seen has been more on the non-consensual training of models. Many artists don't want their work to be sucked up into the "AI Borg Model" and then regurgitated by someone else, removing the artists consent, credit, and compensation.

scheeseman4863y ago

I found it rare that those dead-set against AI art actually concede that it has value after you take copyright out of the equation, bringing up Adobe Firefly instead pivots the conversation to other, considerably weaker arguments.

Using stock art is just further appropriation, which is silly considering the intent and licensing of stock artwork is clearly intended by all parties to turn works into commodities for commercial exploitation.

The old ways are best, the new ways are bad and take away the soul from the creation process and resulting works. Also unconvincing considering that most of the people saying that are using radically different, digitized, heavily time optimized art workflows compared to norm of the industry even 30 years ago.

Not that I don't see the problems, the potential for job losses due to the optimizations to workflow requiring less work and therefore less workers is an actual risk, but one that happens regardless of copyright enforcement of AI models. The problems commercialized AI art workflows cause may even be exacerbated by enforcement of copyright on training data by handing a monopoly of all higher quality generative AI models into already entrenched multinational intellectual property rightsholders hands. I think a lot of artists forget that copyright isn't as much for them as it's for the Disneys of the world.

1 more reply

regularfry3y ago

I absolutely have seen it. A lot. It's dressed up as Luddism, more often expressed as "you shouldn't be able to have those results because I spent years honing my craft" which may or may not be followed by "...and if we allow this, those years were wasted and I'm out of a job, along with millions of others".

2 more replies

zirgs3y ago

SD base models can't really be used to imitate style of other artists reliably, because the datasets that they were trained on are a huge mess. Caption accuracy is all over the place. For example - Joan Cornella's work and Cyanide & Happiness comics are in LAION5B, but if you prompt SD to make art in their style you'll get something completely different. Try prompting for a "minigun" - you will also get something weird.

In order to copy style from other artists reliably - you have to make a LoRA yourself. That involves a lot of manual work and it can't really be automated if you want good results.

Artists can opt out of future SD base models (which doesn't matter), but they can't opt out of someone making a LoRA of their work (which actually works).

true_religion3y ago

>> a lot of which is motivated by a sense of unfairness > This is not something I've seen once in any sort of criticism of "AI art"

I've actually seen this a lot.

In my view, it's not coming from professional artists working in the field. Their concern is more that people are ripping off their style, or that AI is making their efforts unnecessary (e.g. lots of people who made a living by copying the style of particular anime & cartoons for fans, no longer have a purpose since AI can do that given enough source material).

Non-professional artists, on the other hand, are still learning and have put a lot of time into their craft and it hasn't paid off yet. They seem to be annoyed that other people are getting results (via AI), without actually having to learn the mechanics of art.

AI basically lets your generic art history major produce lots and lots of pieces, because they can describe artwork well enough and know where to find good samples for the AI. The only thing stopping them was mere mechanical inability, not knowledge of the art space.

numpad03y ago

> and compensation.

Is this part actually coming from artists? What’s the suggested amount(be it upper quadrillion dollars per second or $0.25/use)?

I think compensation as a condition is only assumed implied, that financial gains are artists’ motives and they actually live off that income. Rather, I see a lot of vocal oppositions to AI image generators that aren’t drawing for profit at all.

So, is the money going to solve it, or is it a wrong assumption, or is it that it will have to be settled by lump sums?

1 more reply

whywhywhywhy3y ago

>Most legitimate pushback I've seen has been more on the non-consensual training of models

Look at the pushback to Adobe’s model.

“Non consent of model input” is just a tool they’re using in the hopes of destroying the tech. Plenty of companies have datasets of these same people’s work where the T&C permits training.

The narrative will switch once you can no longer use the “stealing/consent” argument. They won’t suddenly become fine with this tech just because the dataset consented.

1 more reply

minimaxir3y ago

Unfortunately it's become a meme among AI art haters that AI art is "just inputing text into a text box" despite the fact that is far from the truth, particularly if you want to get specific results as this blog post demonstrates.

Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.

squidsoup3y ago

Only if you exclude the countless hours an illustrator has spent developing their craft.

2 more replies

tester4573y ago

It's a meme because 99% of the ai art creators don't go that deep, they only prompt.

Even if they did have a more complex workflow most of them are still based on copyrighted training data, so there will be many lawsuits.

dorkwood3y ago

> Some modern AI art workflows often require more effort than actually illustrating using conventional media.

Then why don’t they illustrate it instead, and save themselves some time?

1 more reply

capableweb3y ago

> Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.

Indeed. Another criticism that I can definitely somewhat see the idea behind, is that the barrier to entry is very different from for example drawing. To draw, you need a pen and a paper, and you can basically start. To start with Stable Diffusion et al, you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware. One way or another, if you want to practice AI generated art, you need more money than what a pen and paper cost.

9 more replies

quadcore3y ago

From what I read on the internet, people assume AI generated art is a difficult question legaly speaking. Some literally assume artists complain only because there are out competed.

I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak. I really dont see what's difficult with that case. I think the internet assume a bit to quickly it's a difficult question and a grey area when maybe it just isnt.

It's noteworthy that Adobe did things differently than the others and the way they did things goes in the direction im describing here. Maybe it's just confirmation bias.

dragonwriter3y ago

> I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

> That’s because you can’t find an artist for a generated picture other than the ones in the training set.

First, that’s clearly not true when you are using ControlNet with the input being human generated, or even img2img with a human generated image, but second and more importantly…

> If you can’t find a new artist, then the picture belongs to the old ones, so to speak.

That’s not how copyright law works. The clearest example (not particularly germane to the computer generation case, but clearly illustrative of the fact that “can’t find another artist” is far from dispositive) is Fair Use noncommercial timeshifting of an existing work: it is extremely clear there is no artist but that of the original work, and yet it is not copyright infringement.

> I really dont see what’s difficult with that case.

You’ve basically invented a rule of thumb out of thin air, and observed that it would not be a difficult case if your rule of thumb was how copyright law works.

Your observation seems correct to that extent, the problem is that it has nothing to do with copyright law.

> I think the internet assume a bit to quickly it’s a difficult question and a grey area when maybe it just isn’t.

IP law experts have said that the Fair Use argument is hard to resolve.

Assuming the lawsuits currently ongoing aren’t settled, we’ll know when they are resolved what the answer is.

circuit103y ago

It’s not as simple as that though because the algorithm does learn by itself and mostly just uses the training data to score itself against, it doesn’t directly copy it as some people seem to think. It can end up learning to copy things if it sees them enough times though

“you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak”

I don’t think that’s valid on its own as a way to completely discount considering how directly it’s using the data. As an extreme example, what if I averaged all the colours in the training data together and used the resulting colour as the seed for some randomly generated fractal or something? You could apply the same arguments - there is no artist except the original ones in the training set - and yet I don’t think any reasonable person would say that the result obviously belongs to every single copyright owner from the training set

mdp20213y ago

> an artist for a generated picture

Normally - outside the specific context of AI generated art -, there is not a relation "work¹ → past author" , but "work → large amount of past experience". (¹"work": in the sense of product, output etc.)

If the generative AI is badly programmed, it will copy the style of Smith. If properly programmed, it will "take into account" the style of Smith. There is a difference between learning and copying. Your tool can copy - if you do it properly, it can learn.

All artists work in a way "post consideration of a finite number of past artists in their training set".

ModernMech3y ago

But this person’s dog isn’t in the training set, so why should some artist be credited for a picture they never drew? Not a single person has drawn his dog before, now there is a drawing of his dog, and you want to credit someone who had no input to the creative process here?

2 more replies

GuB-423y ago

> That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak.

It doesn't belong to the "old ones", it is at best a derivative work. And even writing a prompt, as trivial as it might seem, makes you an artist. There are modern artists exposing a random shit as art, and you may or may not like it, but they are legally artists, and it is their work.

The question is about fair use. That is, are you allowed to use pictures in the dataset without permission. It is a tricky question. On one extreme, you won't be able to do anything withing infringing some kind of copyright. Used the same color as I did? I will sue you. On the other extreme, you essentially abolish intellectual property. Copying another artist style in your own work is usually fair use, and that's essentially what generative AI do, so I guess that's how it will go, but it will most likely depends on how judges and legislators see the thing, and different countries probably will have different ideas.

1 more reply

cthalupa3y ago

>That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak

We have some countries where it is explicitly legal to train AI models on copyrighted data without consent, and precedent in the US that makes this a plausible outcome there as well.

Could you explain what portion of copyright law you believe would cover this argument? I'm not a lawyer, but have a passing familiarity with US copyright law, and in it, at least, I do not know of anything that would support the idea you're proposing here. How would you even assign copyright to the "old" artists? How are you going to determine what percentage of any given generation was influenced by artists X, Y, Z?

rendall3y ago

> AI generative art is an easy case of copyright infrigement...

Agreed. An AI model trained on an artist's work without permission is IP infringement and this should be widely understood. Unfortunately, because the technology is new people do not understand this. When Photoshop was new, there was a similar misunderstanding. People could take an artist's work, run it through Photoshop, and then not compensate the artist. It took some time for that to sort out.

stavros3y ago

I agree. This is a clear-cut case of copyright infringement, as is all art. After all, people painting images have only seen paintings other people painted.

1 more reply

numpad03y ago

The only problem to that, and a big one, is that there’s no way to trace back to the image in the dataset from a final output of AI.

It’s a static mapping, surely it should be possible, you’d think, but NN frameworks aren’t designed that way. That is blocking it from happening(and also allowing “AI is just learning, human is same” fallacy)

mdp20213y ago

The shruggingface submission is very interesting and very instructive.

Nonetheless, it would be odd and a weak argument to point criticism towards not spending adequate «time and effort» (as if it made sense to renounce tools and work through unnecessary fatigue and wasting time). More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".

This said, I'd say that Stable Diffusion is a milestone of a tool, incredible to have (though difficult to control). I'd also say that the results of the latest Midjourney (though quite resistant to control) are at "speechless" level. (Noting in case some had not yet checked.)

Paul-Craft3y ago

> More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".

I don't get this. If one "can produce pleasing graphics," how does that not equal knowing what they're doing? I only see this as being true in the sense of "Sure, you can get places quickly in a car, but you don't really know how it works."

1 more reply

basisword3y ago

> if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?

This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall. There’s probably a market for it, but I get the strong impression it’s the “live, laugh, love” market. The people that buy pictures for their wall in the supermarket. The kind of people who pay individual artists money to paint bespoke images of their pet are not going to frame AI art. I don’t think the artists need to worry.

Auracle3y ago

It’s completely what you make it, though. If what’s in the OP isn’t your style you could literally type in anything you want.

I’ve done pictures of my wife in the style of other photographers, Soviet-style propaganda posters, 50s pinups, Alphonse Mucha, and much more.

I’m a professional photographer and have tons of great pictures of our dog - the kind of stuff people pay for. My wife’s lock screen on her phone is something I generated instead.

yellow_postit3y ago

I would expect it’s only a matter of time till those “traditional” artists also adopt these tools into their workflows. Similar to the initial pushback against the “digital darkroom” which is now the mainstay of photography.

In-ai-aided art, like manually developed film, will trend towards a niche.

theaiquestion3y ago

> This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall.

Well yeah but that doesn't change the OP commenter's point that it takes a lot of work to get high quality art still.

> I don’t think the artists need to worry.

I disagree here but only on the basis of what type of art it is. Stock art/photography, and a lot of media designwork is likely at risk because we can now create "good enough" art at the click of a button for almost no cost. I agree that the "hang on the wall level good" artists aren't at risk just yet, but between the more filler-art and the uh

Well "anime/furry" commissioners are definitely at risk right now for anything except the highest quality artists, and there is a MASSIVE community behind this - in fact they have done a lot of the innovation for StableDiffusion including optimizations/A1111 webui, and have trained many custom models for their art, already had pretagged datasets of 10k's of images....

1 more reply

Aeolun3y ago

Eh, there might be a market for AI art. As long as the artist is guaranteed to have made only a single one of every piece.

odessacubbage3y ago

aishit is a reverse turing test. if you find it's output exciting or impressive you can no longer qualify as human.

asddubs3y ago

most of the criticism I've seen is that it's all trained on uncompensated stolen artwork. Much like how copilot is trained on GPL code, disregarding its license terms.

simonw3y ago

The trained on stolen artwork critique is reasonable - I helped with one of the first big investigations into how that training data worked when Stable Diffusion first came out: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weekno...

It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.

I'm under the impression that DALL-E itself used licensed data as well.

I find some people are comfortable with that, but others will switch to different concerns - which indicates to me that they're actually more offended by the idea of AI-generated art than the specific implementation details of how it was trained.

4 more replies

minimaxir3y ago

The general argument (IANAL) is that it's Fair Use, in the same vein as Google Images or Internet Archive scraping and storing text/images. Especially since the outputs of generated images are not 1:1 to their source inputs, so it could be argued that it's a unique derivative work. The current lawsuits against Stability AI are testing that, although I am skeptical they'll succeed (one of the lawsuits argues that Stable Diffusion is just "lossy compression" which is factually and technically wrong).

There is an irony, however, that many of the AI art haters tend to draw fanart of IP they don't own. And if Fair Use protections are weakened, their livelihood would be hurt far more than those of AI artists.

The Copilot case/lawsuit IMO is stronger because the associated code output is a) provably verbatim and b) often has explicit licensing and therefore intent on its usage.

3 more replies

userbinator3y ago

AI is just showing us a fact that many are unwilling to admit: everything is a derivative work. Much like humans will memorise and regurgitate what they've seen.

1 more reply

brucethemoose23y ago

TBH it would be much easier with more streamlined tooling, especially if doing it locally with lora/lycoris.

Its kinda like using ffmpeg for vapoursynth for video editing instead of a video editing GUI.

That being said the training parameter/data tuning is definitely an art, as is the prompting.

davely3y ago

I love the detailed workflow that OP posted. Dogs seem to be particularly good subject material for this.

I turned my dog into a robot awhile back using the img2img feature of Stable Diffusion and the results were pretty amazing![1]

[1] https://twitter.com/davely/status/1583233180177297408

quadcore3y ago

a lot of which is motivated by a sense of unfairness

Say you generate a picture with midjourney - who is/are the closest artist(s) you can find for that picture?

Not the AI, not the prompter, so the closest artists you can find for that picture are the ones who made the pictures in the training set. So generating a picture is outright copyright infringement. Nothing to do with unfairness in the sense of "artists get out compete". Artists dont get out compete - they are stolen.

ModernMech3y ago

Typical Midjourney workflow involves constantly reprompting and fine tuning based on examples and input images. When you arrive at a given image in Midjourney, it’s often impossible to recreate it even with the same seed. You’ll need the input image as well, and the input image is often the result of a long creative process.

Why is it you discount the creative input of the user? Are they not doing work by guiding the agent? Don’t their choices of prompt, input image, and the refinement of subsequent generated images represent a creative process?

1 more reply

Auracle3y ago

I've done so much with a fine-tuned model of my dog.

I previously made coloring pages for my daughter of our dog as an astronaut, wild west sheriff, etc. They're the first pages she ever "colored," which was pretty special for us. Currently I'm working on making her into every type of Pokemon, just for fun.

Auracle3y ago

I uploaded a couple of the Pokemon generations really quick as examples. I still need to go through and do quick fixes for double tails (the tails on Pokemon are not where they are on regular animals, apparently), watermarks, etc. and do a quick Img2Img on them.

https://imgur.com/a/11OxoSA

minimaxir3y ago

For generating Pokemon, I recommend using this model along with a textual inversion of your pet: https://huggingface.co/lambdalabs/sd-pokemon-diffusers

2 more replies

jakedahnOP3y ago

These are great!

1 more reply

mdp20213y ago

Using which tools, specifically?

Auracle3y ago

Stable Diffusion, generically.

StableTuner to fine tune the model - I can't recall the name of the model I trained on top of, but it was one of the top "broad" 1.5 based models on Civitai. Automatic1111 to do the actual generating. I used an anime line art LoRA (at a low weight) along with an offset noise LoRA for the coloring book pages as otherwise SD makes images be perfectly exposed. For something like that you obviously want a lot more white than black.

EveryDream2 would be another good tuning solution. Unfortunately that end of things is far from easy. There are a lot of parameters to change and it's all a bit of a mess. I had an almost impossible time doing it with pictures of my niece, my wife is hit or miss, her sister worked really well for some reason, and our dog was also pretty easy.

go_discover3y ago

Do you need an m1 macbook to do this? I have a 2015 macbook pro..

jeroenhd3y ago

Stable Diffusion can run on Intel CPUs through OpenVINO if you don't have a GPU or the funds to rent one online (Google Collab is often used). You still need a decent amount of RAM (running SD takes about 8GB, training seems to run at 6-8GB) so I'd consider 12 or 16GiB of RAM to be a requirement.

There's a huge difference in performance (generating an image takes 10 minutes rather than 10 seconds and training a model would take forever) but with some Python knowledge and a lot of patience it can be done.

Apple's Intel Macbooks are infamous for their insufficient cooling design for the CPUs they chose, which won't help maintaining a high clock speed for extended durations of time; you may want to find a way to help cool the laptop down to give the chip a chance to boost more, and to prevent prolonged high temperatures from wearing down the hardware quicker.

Auracle3y ago

I’m on Windows, sorry. There are some colabs where you can do both the training and generation though.

ricochet113y ago

theres lots of online services too if you dont have the hardware - e.g. https://www.astria.ai/

skor3y ago

I liked the original more than the final version. The vector style drawing was much more futuristic and more interesting.

Seems like lots of work went into that and I hope the author enjoyed the process and enjoys the final result.

joegahona3y ago

I did too, and I even liked the aggressive cropping. Totally subjective though. The final result was beautiful as well, and this was a joy to read.

sinman3y ago

I did something loosely related. As a present for my girlfriend's birthday, I made her a "90s website" with AI portraits of her dog: https://simoninman.github.io/

It wasn't actually particularly hard - I used a Colab notebook on the free tier to fine-tune the model, and even got chatGPT to write some of the prompts.

Auracle3y ago

In my (limited) experience, dogs seem to be easier than people for fine-tuning - especially if your end result is going to be artsy. Faces of people you know well being off in slight ways really throws you off, but with dogs there's a bit more leeway.

jakedahnOP3y ago

hah, these are pretty cool! Well done!

wincy3y ago

He mentions the Colab for Dreambooth, that only takes ten minutes or so to train using an A100 (the premium GPU) and you can have it turn off after it finishes, and saves to Google Drive. Super easy.

jakedahnOP3y ago

Yeah!

Here's the colab notebook, in case anyone is interested: https://github.com/TheLastBen/fast-stable-diffusion

I've trained a few smaller models using their Dreambooth notebook, but I think for 4000 training steps, an A100 will usually take 30-40min. I believe replicate also uses A100s for their dreambooth training jobs.

wincy3y ago

Ah I see, you're right 40 minutes sounds about right for that amount of training. Curious why the decision to train 40 images? I've used 15 for two separate subjects in Dreambooth with excellent results. I'm no expert, experimenting the same way as you, but haven't trained on more than 15-20 images per subject.

I've found the most important part is spending a good amount of time getting the prompts, although I'm not sure if having the person in an environment embodied and describing the objects around them helps give the model a "sense of scale"? For example if I just train "wincy" in the fast Dreambooth "wincy" will be the only token it'll know, with no other info in the prompts, it didn't know what in the image was "wincy" (me). I accidentally did this on training my wife (no prompts at all) and she got really mad at me at how ugly the results were (you made me ugly! haha)

Have you tried it with and without your dog in an environment, then describing the environment your dog is in for the training data?

MasterScrat3y ago

FYI we're building a service to make this process even simpler and faster:

dreamlook.ai

Upload your pictures, we train the model in a few minutes, then you can download your trained checkpoint. $1/model, first one for free.

For app builders, we provide a solid API that scales to 1000s of runs per day without breaking a sweat.

frindo3y ago

I did the exact same thing when I saw DreamBooth for the first time! I showed it to a bunch of friends and they convinced me to turn it into an iOS app. https://apps.apple.com/app/ai-avatar-for-dogs-floof-ai/id165...

People have been sending me the cute pics the AI generates of their pups. I think this is arguably the best thing so far in this latest wave of AI releases!

Aeolun3y ago

The dog picture is really nice, but then it’s hung on the same wall as 20 other crowded pieces of (in my opinion) dubious quality.

This would have been much better standalone.

nbzso3y ago

Valuable processing info in the comments. But why so much effort to produce something without the option to have ownership (copyright) over my product? If I draw a strange line with any digital painting tool and put a circle and a square around, I sign this art and this is my Art. If a spend a day with prompting, upscaling, fixing with Control net in the end of the day I will have a funny picture which is not mine.

https://fortune.com/2023/02/23/no-copyright-images-made-ai-a...

efreak3y ago

Some of us don't care about ownership. You could ask the same question of anyone contributing to open source projects

nbzso3y ago

This is a different use case. Why? Because you make a conscious decision to donate your work. The models which are used for image generating (Midjourney, Stable Diffusion) are full of scraped data without consent from the authors.

From this point of view, Adobe Firefly is obviously ahead:

"The current Firefly generative AI model is trained on a dataset of Adobe Stock, along with openly licensed work and public domain content where copyright has expired.

As Firefly evolves, Adobe is exploring ways for creators to be able to train the machine learning model with their own assets, so they can generate content that matches their unique style, branding, and design language without the influence of other creators’ content. Adobe will continue to listen to and work with the creative community to address future developments to the Firefly training models."

So the only way forward to have an ownership of your product is to train your own models over your own data.

cinntaile3y ago

It's unfortunate a lot of the nice artsy detail disappeared when he had to recreate part of the head, but I guess that is inevitable. Great work and interesting writeup.

asadlionpk3y ago

If anyone wants to try Dreambooth online, I made a free website for this: https://trainengine.ai

spaceman_20203y ago

I would highly recommend using Photoroom's background removal tool. Does a far, far better job than Photoshop.

throwaway6753093y ago

PixelMator is a highly competitive native Mac app, has an excellent background remover and unlike photoroom/PS, it's a one time purchase.

amelius3y ago

But why pick a dog as an example?

Humans are much worse in telling dogs apart than other humans (except perhaps the owner of the particular dog).

So for all we know, the AI didn't generate a portrait of this particular dog but instead a generic picture of this breed of dog.

jakedahnOP3y ago

Mostly because I thought of it more as an art project than a technical accuracy project. However, the honest answer to your question, is because I have a ridiculous amount of photos of my dog on my phone . Getting training data is hard work.

But this is totally true, I found that maybe 30% of the images I generated did not look like my dog at all. However the rest do a good job at capturing his eyes and facial expressions that he actually makes. I thought that the chosen image I worked from captured the look of his eyes super well.

But yeah, nobody but me would really appreciate that.

chipgap983y ago

Because you invent a new word when you train dreambooth and teach it that your subject is an example of that word. The fact that the word you've created returns photos similar to subject is a sign that it worked.

amelius3y ago

I suppose that dreambooth is pretrained on a large dataset that includes many different dogs.

My point is that it is difficult to judge (for us) that the returned photos are actually similar to the subject.

1 more reply

Auracle3y ago

I linked this elsewhere but here are Pokémon image generations of my (mutt) dog: https://imgur.com/a/11OxoSA

She’s pretty unique looking and it comes through even with heavy styling.

liuliu3y ago

There might be a few things Draw Things missing from this article: no mask blur, not selecting the inpainting model for inpainting work.

Tomorrow's release should contain both mask blur and inpainting ControlNet, which might help these use cases.

jakedahnOP3y ago

Yeah, it was likely just user error. I actually really love Draw Things, because I can run it locally on my mac and quickly experiment without having to sling HTTP requests or spin up GPUs.

I did the actual work back on March 11th, so I was likely on an older build; but I was seeing issues where inpainting was just replacing my selection/mask with a white background. I had the inpainting model loaded, but couldn't figure it out.

I'm planning to continue playing with Draw Things locally, and exploring the inpainting stuff. For such an iterative process I feel like a local client would make for the best experience.

liuliu3y ago

There is no user error but UX issues :)

That has been said, you probably used paintbrush rather than the eraser? There would be more help on the Discord server though! https://discord.gg/5gcBeGU58f

GaggiX3y ago

>I was wrong . it seemed to take the top and bottom-most row of pixels and extend them down from 512px tall to 1344px tall.

I mean you cannot outpaint in the img2img tab, load the image in the inpaint tab and possibly use the inpainting model.

jakedahnOP3y ago

ah-ha! This was probably it

indigodaddy3y ago

Pretty cool stuff. Personally though, not a huge fan of his “the one” choice. Some of the other images in his assortment were much better imo. Each to their own of course though!

steve_adams_863y ago

I agree, but I find it pretty cool that they were able to generate and pick from what they wanted. This seems like one of the real strengths of generative AI — people can tune outputs they otherwise couldn’t create (unable to paint, draw, play guitar, etc).

People can debate if it’s actually good that people can create art without being artists, but again, I think it’s great that the author had the freedom to create what they had in mind without much outside influence. This has been a goal for computers in general for so long, and it seems like we’re actually arriving with some mediums.

sammalloy3y ago

> Pretty cool stuff. Personally though, not a huge fan of his “the one” choice. Some of the other images in his assortment were much better imo. Each to their own of course though!

Glad to see I’m not alone on this. I think the end result would have turned out much better if the author had simply adhered to the Huichol art palette, which I’m convinced they were aiming for at the beginning. That color scheme works for a reason.

https://en.wikipedia.org/wiki/Huichol_art

penthi3y ago

Nicely done. I built a t-shirt/mug/frame printing app. I am using stablediffusion (intructpix2pix for selfies) with prompts pulled in from Lexica. The larger images are created with swinir and physical printing is from the good folks at printful.Com.

Big props to folks at replicate.Com for making solid infrastructure for ml.

https://www.ai-ink.me/

mlboss3y ago

Awesome work. I build an app to train dreambooth model and generate images of hich makes this process very easy.

The app also has a rest endpoint for anybody to create app using it. Lot of clients create niche websites catering to different use cases. There is a kind of gold rush going on in this area.

https://aipaintr.com

cto_official3y ago

Also how much $s were spent on the project?

mlsu3y ago

Fantastic! definitely bookmarking. I spent a big part of the last few days attempting this, my model didn't come out nearly so well. I decided that it was because I don't have enough training images, and so have been taking 3x as many pictures of my dog to compensate.

Borrible3y ago

Recently saw a nice replica of Duchamp's work 'Bottle Rack' from 1959. Readymade, but maybe a bit expensive. For the price they asked, I could do it myself in a blacksmithing class and have more fun.

hartator3y ago

Isn’t disappointing that nothing important is open source these days in AI?

cogitoergofutuo3y ago

Dreambooth implementation https://github.com/JoePenna/Dreambooth-Stable-Diffusion

AUTOMATIC1111

https://github.com/AUTOMATIC1111/stable-diffusion-webui

Stable Diffusion

https://github.com/Stability-AI/StableDiffusion

dezmou3y ago

I used Stable Diffusion Dreambooth to generate my github profil picture, what do you think ? https://github.com/dezmou

blitzar3y ago

This is the barely a full step from; "I used Stable Diffusion and Dreambooth to create nudes of a person I know".

Yes, They're Real and They're Spectacular.

true_religion3y ago

All the current stable diffusion commercial applications are in this format: take picture of subject & create a bunch of new portraits of subject.

For example:

- There's one that creates gaming avatars based on your picture - There's one that makes professional headshots for your CV

Even Microsoft's own product (Microsoft Designer) to use AI to create posters & flyers is most useful when you start with an image of your own creation then use the AI to change the style of the image, or integrate it into a template that it dreams up.

lxe3y ago

This is a great writeup on some of the nuances and gotchas you have to watch out for when finetuning using dreambooth and the generative creative process in general.

cto_official3y ago

It's impressive, The end result was beautiful. I always use to wonder how to generate some meaningful art

Any references where the same has been tried on humans ?

birdfood3y ago

I find it interesting that this green/orange colour palette so commonly appears in midjourney images, seemingly regardless of the subject.

cubefox3y ago

Not just Midourney, also Dall-E 3 often does it:

https://www.bing.com/images/create/a-bright-day-on-the-stree...

https://www.bing.com/images/create/christmas-on-board-a-spac...

https://www.bing.com/images/create/in-the-streets-of-atlanti...

cogitoergofutuo3y ago

This is really interesting. I do wish the author included the cost to train the model from replicate though.

tonmoy3y ago

If I wanted to do this, what kind of specs would I need to have on my Desktop Computer?

phonescreen_man3y ago

Is the link an AI generated tutorial? Write me a blog post tutorial pretend you are …

OOPMan3y ago

"art portrait" seems grammatically wrong...?

yieldcrv3y ago

Results at the top of your article/project please

renewiltord3y ago

Great work writing up the process. Much appreciated!

ronnykylin3y ago

The style looks Andy Warhol to me

EGreg3y ago

What are the tools we can run on a Linux machine?

EDIT: four downvotes and zero answers how to run it on a Linux machine…

minimaxir3y ago

You were likely downvoted because you asked how to use it for NFTs, which you just edited it out.

EGreg3y ago

I don’t see why that is relevant. Why is using it for NFTs worthy of a downvote?

cogitoergofutuo3y ago

The only piece of software mentioned in the article that doesn’t run on Linux is Draw Things.

bigbillheck3y ago

Personally I paid a friend $200 to create an art portrait of my dog.

mkoryak3y ago

Not all of us have friends or 400$

pxoe3y ago

the lengths techbros will go for in order to avoid paying an artist for artwork

as well as, doing all that nn/ml stuff, instead of just, trying to learn a bit of how to make an artwork themselves, how to draw something, even by tracing over a photo, like doing a 'how to do a vector colorful painting dog' search and going off on that.

like, this end result doesn't even look far off from what a 'colorful vector dog portrait' tutorial would yield. it just involves tons and tons of questionably sourced artwork, and violated copyrights. (i know techbros are very confused about copyrights, but stuff like licenses and copyrights actually do have their meanings, limitations, and liabilities)

specifically picking stablediffusion, probably the most blatantly stolen artwork-based model (given how open and clear it is with what data has been used for it, and how you can't just squirm 'i didn't know what were the terms of use of their data' with other, more closed-off services), that's just another great touch as well.

dsign3y ago

> the lengths techbros will go for in order to avoid paying an artist for artwork

Hiring an artist removes control, it's not your art, but the artist's what ends up on the wall. That's the reason that I avoid hiring artists when I want some art made.

With that said, I have found that at least for now using AI for art is just too much work and too little control. I want my art on the wall, not whatever the AI model outputs. So, I'm in your camp of "just learn to paint the darn stuff". I find it's a lot more fun.

zirgs3y ago

One of my hobbies is 3D art. I love 3D sculpting, but hate texturing and rigging. This is why I'd like the AI to do the unpleasant parts.

1 more reply

hatefulmoron3y ago

why would he pay for an artist when he's happy with what he has? Why do drawcels feel so entitled to be looped in financially for no reason?

I find the complaint about copyright so strange in this case. Copyright has a purpose, stopping some random person from creating an imagine only they will see and use is not that purpose. In this case it's just spiteful. Ultimately, if you think he's infringing your copyright you should sue him, but I don't think you'd win.

pxoe3y ago

besides whether artists should get paid or not, or whether they should be reimbursed for use of their art or not, using something without permission, or rights, or license, without something that'd actually (legally) enable to do so without violating copyright, is just bad in itself.

there's a great alternative to "not paying/refusing to pay" (but using and stealing stuff anyway) - just, not using other people's stuff. not using stuff that's built on copyright/license violations. not using artwork you don't own, that you don't have rights, licenses, or permissions to use. (yes, simply 'taking something and making a model from it', would be a violation.) one could just not do a shitty thing, and they wouldn't have to jump hoops to find any justification for the shitty thing they did.

they could do a step-by-step art tutorial, and wouldn't have to pay anybody, nor use tools that rely on stolen artwork. but nope.

highly ironic how they made this thing, and promptly showed it off to thousands of people on the internet, immediately invalidating your example

they also promote (just by choosing and mentioning all of these things) those services, like Replicate, that monetize the use of stolen artwork (by selling compute, directly coupled with nn models), and ultimately profit from it (solely, without "giving back to artists whose art they perused" or anything).

they could make art in a way that wouldn't participate in tech art theft racket, but they didn't. and they didn't just participate in it, but promote it and perpetuate it.

2 more replies

kstenerud3y ago

This sounds a lot like wagoners complaining about how these newfangled automobiles are profiting off their wagon designs, and not compensating them for it.

challengedchip3y ago

No, a better subject for that analogy would be software engineers complaining about ChatGPT's coding abilities. Wagons & cars are both a matter of mechanical engineering. A digital calculator didn't displace pen-and-paper computers in the way that power drills didn't displace carpenters. Transcribing and operating a printing press are both rote procedures where the skill involved achieves accuracy and the materials involved are mostly the same. But operating Stable Diffusion and painting in PhotoShop with a stylus require wildly different modes of operation, and enjoying or being good at the latter hardly suggests that one would enjoy or be good at the former.

This sort of image generation (especially extrapolating probable upgrades & improvements of just the next few years) displaces artists without providing an upgrade path. It shouldn't take much empathy to understand how that's frustrating and scary that must be for them.

I think pxoe's expression of frustration is totally reasonable. The engineers who made this stuff could have focused on using AI to enable new possibilities, instead of undercutting existing possibilities (to create new markets instead of overtaking existing ones). They could have used 100% consensual training data, but instead felt entitled to exploit a loophole/ambiguity in the social contract under which artists have been sharing their work on the internet.

A more appropriate analogy would be, this sounds a lot like social movements complaining about capitalist co-opting of their symbols, e.g. the use of "communist" rhetoric to build oligarchies (or the sale of "save the earth" mugs made from oil-derived plastics, etc.). Even that isn't a perfect analogy, though, as the co-opted output wasn't itself the displaced work-product, although it does a better job of capturing the emotional side of it, the sense of betrayal. Ultimately, I don't think it's productive to reduce the reaction to the decisions behind Stable Diffusion etc. to a single analogy, and it shouldn't be so hard to say "sorry, you're right, this is bad for you and good for me and you have every right to express your frustration over that irreversible decision".

1 more reply

cztomsik3y ago

feeling better?

j / k navigate · click thread line to collapse

236 comments

pjgalbraith3y ago

Some feedback on workflow:

  - Automatic1111 outpainting works well but you need to enable the outpainting script. I would recommend Outpainting MK2. What the author did was just resize with fill which doesn't do any diffusion on the outpainted sections.
  - There are much better resizing workflows, at a minumum I would recommend using the "SD Upscale Script". However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control. In this case "SD Upscale" is fine but the inpaint based upscale works well with complex compositions.
  - When training I would typically recommend to keep the background. This allows for a more versitile finetuned model.
  - You can get a lot more control of final output by using ControlNet. This is especially great if you have illustration skills. But it is also great to generate varitions in a different style but keep the composition and details. In this case you could have taken a portrait photo of the subject and used ControlNet to adjust the style (without and finetuning required).

raincole3y ago

> However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control.

Diffuse an 8k image? Isn't it going to take much, much more VRAM tho?

thot_experiment3y ago

https://i.imgur.com/zOMarKc.jpg

3 more replies

SV_BubbleTime3y ago

That confused me at first too.You aren’t diffusing the 8k image.

1 more reply

asynchronous3y ago

To clarify when things are upscaled like that they typically mean a section of img2img in a grid pattern to make up the full picture so it doesn’t overuse vram.

smusamashah3y ago

For outpainting there are these two amazing tools which give you a canvas to do stuff

https://github.com/zero01101/openOutpaint

https://github.com/BlinkDL/Hua

Both use automatic1111 API for the work.

jakedahnOP3y ago

Thank you for these recommendations! I'll definitely be trying them next time 'round.

pjgalbraith3y ago

Good luck! I have some workflow videos on Youtube https://youtube.com/pjgalbraith. But I haven't had a chance to show off all the latest techniques yet.

simonw3y ago

I love how much work went into this.

(I do not share this opinion myself, but it's something I've seen a lot)

This is another great counter-example showing how much work it takes to get the best, deliberate results out of these tools.

madeofpalk3y ago

> a lot of which is motivated by a sense of unfairness

This is not something I've seen once in any sort of criticism of "AI art", and elsewhere in the internet I'm largely in a anti-ai-art bubble.

scheeseman4863y ago

1 more reply

regularfry3y ago

2 more replies

zirgs3y ago

In order to copy style from other artists reliably - you have to make a LoRA yourself. That involves a lot of manual work and it can't really be automated if you want good results.

Artists can opt out of future SD base models (which doesn't matter), but they can't opt out of someone making a LoRA of their work (which actually works).

true_religion3y ago

>> a lot of which is motivated by a sense of unfairness > This is not something I've seen once in any sort of criticism of "AI art"

I've actually seen this a lot.

numpad03y ago

> and compensation.

Is this part actually coming from artists? What’s the suggested amount(be it upper quadrillion dollars per second or $0.25/use)?

So, is the money going to solve it, or is it a wrong assumption, or is it that it will have to be settled by lump sums?

1 more reply

whywhywhywhy3y ago

>Most legitimate pushback I've seen has been more on the non-consensual training of models

Look at the pushback to Adobe’s model.

“Non consent of model input” is just a tool they’re using in the hopes of destroying the tech. Plenty of companies have datasets of these same people’s work where the T&C permits training.

The narrative will switch once you can no longer use the “stealing/consent” argument. They won’t suddenly become fine with this tech just because the dataset consented.

1 more reply

minimaxir3y ago

Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.

squidsoup3y ago

Only if you exclude the countless hours an illustrator has spent developing their craft.

2 more replies

tester4573y ago

It's a meme because 99% of the ai art creators don't go that deep, they only prompt.

Even if they did have a more complex workflow most of them are still based on copyrighted training data, so there will be many lawsuits.

dorkwood3y ago

> Some modern AI art workflows often require more effort than actually illustrating using conventional media.

Then why don’t they illustrate it instead, and save themselves some time?

1 more reply

capableweb3y ago

> Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.

9 more replies

quadcore3y ago

From what I read on the internet, people assume AI generated art is a difficult question legaly speaking. Some literally assume artists complain only because there are out competed.

I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

It's noteworthy that Adobe did things differently than the others and the way they did things goes in the direction im describing here. Maybe it's just confirmation bias.

dragonwriter3y ago

> I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

> That’s because you can’t find an artist for a generated picture other than the ones in the training set.

First, that’s clearly not true when you are using ControlNet with the input being human generated, or even img2img with a human generated image, but second and more importantly…

> If you can’t find a new artist, then the picture belongs to the old ones, so to speak.

> I really dont see what’s difficult with that case.

You’ve basically invented a rule of thumb out of thin air, and observed that it would not be a difficult case if your rule of thumb was how copyright law works.

Your observation seems correct to that extent, the problem is that it has nothing to do with copyright law.

> I think the internet assume a bit to quickly it’s a difficult question and a grey area when maybe it just isn’t.

IP law experts have said that the Fair Use argument is hard to resolve.

Assuming the lawsuits currently ongoing aren’t settled, we’ll know when they are resolved what the answer is.

circuit103y ago

“you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak”

mdp20213y ago

> an artist for a generated picture

All artists work in a way "post consideration of a finite number of past artists in their training set".

ModernMech3y ago

2 more replies

GuB-423y ago

> That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak.

1 more reply

cthalupa3y ago

>That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak

We have some countries where it is explicitly legal to train AI models on copyrighted data without consent, and precedent in the US that makes this a plausible outcome there as well.

rendall3y ago

> AI generative art is an easy case of copyright infrigement...

stavros3y ago

I agree. This is a clear-cut case of copyright infringement, as is all art. After all, people painting images have only seen paintings other people painted.

1 more reply

numpad03y ago

The only problem to that, and a big one, is that there’s no way to trace back to the image in the dataset from a final output of AI.

mdp20213y ago

The shruggingface submission is very interesting and very instructive.

Paul-Craft3y ago

> More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".

1 more reply

basisword3y ago

> if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?

Auracle3y ago

It’s completely what you make it, though. If what’s in the OP isn’t your style you could literally type in anything you want.

I’ve done pictures of my wife in the style of other photographers, Soviet-style propaganda posters, 50s pinups, Alphonse Mucha, and much more.

I’m a professional photographer and have tons of great pictures of our dog - the kind of stuff people pay for. My wife’s lock screen on her phone is something I generated instead.

yellow_postit3y ago

In-ai-aided art, like manually developed film, will trend towards a niche.

theaiquestion3y ago

> This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall.

Well yeah but that doesn't change the OP commenter's point that it takes a lot of work to get high quality art still.

> I don’t think the artists need to worry.

1 more reply

Aeolun3y ago

Eh, there might be a market for AI art. As long as the artist is guaranteed to have made only a single one of every piece.

odessacubbage3y ago

aishit is a reverse turing test. if you find it's output exciting or impressive you can no longer qualify as human.

asddubs3y ago

most of the criticism I've seen is that it's all trained on uncompensated stolen artwork. Much like how copilot is trained on GPL code, disregarding its license terms.

simonw3y ago

It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.

I'm under the impression that DALL-E itself used licensed data as well.

4 more replies

minimaxir3y ago

The Copilot case/lawsuit IMO is stronger because the associated code output is a) provably verbatim and b) often has explicit licensing and therefore intent on its usage.

3 more replies

userbinator3y ago

AI is just showing us a fact that many are unwilling to admit: everything is a derivative work. Much like humans will memorise and regurgitate what they've seen.

1 more reply

brucethemoose23y ago

TBH it would be much easier with more streamlined tooling, especially if doing it locally with lora/lycoris.

Its kinda like using ffmpeg for vapoursynth for video editing instead of a video editing GUI.

That being said the training parameter/data tuning is definitely an art, as is the prompting.

davely3y ago

I love the detailed workflow that OP posted. Dogs seem to be particularly good subject material for this.

I turned my dog into a robot awhile back using the img2img feature of Stable Diffusion and the results were pretty amazing![1]

[1] https://twitter.com/davely/status/1583233180177297408

quadcore3y ago

a lot of which is motivated by a sense of unfairness

Say you generate a picture with midjourney - who is/are the closest artist(s) you can find for that picture?

ModernMech3y ago

1 more reply

Auracle3y ago

I've done so much with a fine-tuned model of my dog.

Auracle3y ago

https://imgur.com/a/11OxoSA

minimaxir3y ago

For generating Pokemon, I recommend using this model along with a textual inversion of your pet: https://huggingface.co/lambdalabs/sd-pokemon-diffusers

2 more replies

jakedahnOP3y ago

These are great!

1 more reply

mdp20213y ago

Using which tools, specifically?

Auracle3y ago

Stable Diffusion, generically.

go_discover3y ago

Do you need an m1 macbook to do this? I have a 2015 macbook pro..

jeroenhd3y ago

Auracle3y ago

I’m on Windows, sorry. There are some colabs where you can do both the training and generation though.

ricochet113y ago

theres lots of online services too if you dont have the hardware - e.g. https://www.astria.ai/

skor3y ago

I liked the original more than the final version. The vector style drawing was much more futuristic and more interesting.

Seems like lots of work went into that and I hope the author enjoyed the process and enjoys the final result.

joegahona3y ago

I did too, and I even liked the aggressive cropping. Totally subjective though. The final result was beautiful as well, and this was a joy to read.

sinman3y ago

I did something loosely related. As a present for my girlfriend's birthday, I made her a "90s website" with AI portraits of her dog: https://simoninman.github.io/

It wasn't actually particularly hard - I used a Colab notebook on the free tier to fine-tune the model, and even got chatGPT to write some of the prompts.

Auracle3y ago

jakedahnOP3y ago

hah, these are pretty cool! Well done!

wincy3y ago

He mentions the Colab for Dreambooth, that only takes ten minutes or so to train using an A100 (the premium GPU) and you can have it turn off after it finishes, and saves to Google Drive. Super easy.

jakedahnOP3y ago

Yeah!

Here's the colab notebook, in case anyone is interested: https://github.com/TheLastBen/fast-stable-diffusion

wincy3y ago

Have you tried it with and without your dog in an environment, then describing the environment your dog is in for the training data?

MasterScrat3y ago

FYI we're building a service to make this process even simpler and faster:

dreamlook.ai

Upload your pictures, we train the model in a few minutes, then you can download your trained checkpoint. $1/model, first one for free.

For app builders, we provide a solid API that scales to 1000s of runs per day without breaking a sweat.

frindo3y ago

People have been sending me the cute pics the AI generates of their pups. I think this is arguably the best thing so far in this latest wave of AI releases!

Aeolun3y ago

The dog picture is really nice, but then it’s hung on the same wall as 20 other crowded pieces of (in my opinion) dubious quality.

This would have been much better standalone.

nbzso3y ago

efreak3y ago

Some of us don't care about ownership. You could ask the same question of anyone contributing to open source projects

nbzso3y ago

From this point of view, Adobe Firefly is obviously ahead:

"The current Firefly generative AI model is trained on a dataset of Adobe Stock, along with openly licensed work and public domain content where copyright has expired.

So the only way forward to have an ownership of your product is to train your own models over your own data.

cinntaile3y ago

It's unfortunate a lot of the nice artsy detail disappeared when he had to recreate part of the head, but I guess that is inevitable. Great work and interesting writeup.

asadlionpk3y ago

If anyone wants to try Dreambooth online, I made a free website for this: https://trainengine.ai

spaceman_20203y ago

I would highly recommend using Photoroom's background removal tool. Does a far, far better job than Photoshop.

throwaway6753093y ago

PixelMator is a highly competitive native Mac app, has an excellent background remover and unlike photoroom/PS, it's a one time purchase.

amelius3y ago

But why pick a dog as an example?

Humans are much worse in telling dogs apart than other humans (except perhaps the owner of the particular dog).

So for all we know, the AI didn't generate a portrait of this particular dog but instead a generic picture of this breed of dog.

jakedahnOP3y ago

But yeah, nobody but me would really appreciate that.

chipgap983y ago

amelius3y ago

I suppose that dreambooth is pretrained on a large dataset that includes many different dogs.

My point is that it is difficult to judge (for us) that the returned photos are actually similar to the subject.

1 more reply

Auracle3y ago

I linked this elsewhere but here are Pokémon image generations of my (mutt) dog: https://imgur.com/a/11OxoSA

She’s pretty unique looking and it comes through even with heavy styling.

liuliu3y ago

There might be a few things Draw Things missing from this article: no mask blur, not selecting the inpainting model for inpainting work.

Tomorrow's release should contain both mask blur and inpainting ControlNet, which might help these use cases.

jakedahnOP3y ago

Yeah, it was likely just user error. I actually really love Draw Things, because I can run it locally on my mac and quickly experiment without having to sling HTTP requests or spin up GPUs.

I'm planning to continue playing with Draw Things locally, and exploring the inpainting stuff. For such an iterative process I feel like a local client would make for the best experience.

liuliu3y ago

There is no user error but UX issues :)

That has been said, you probably used paintbrush rather than the eraser? There would be more help on the Discord server though! https://discord.gg/5gcBeGU58f

GaggiX3y ago

>I was wrong . it seemed to take the top and bottom-most row of pixels and extend them down from 512px tall to 1344px tall.

I mean you cannot outpaint in the img2img tab, load the image in the inpaint tab and possibly use the inpainting model.

jakedahnOP3y ago

ah-ha! This was probably it

indigodaddy3y ago

Pretty cool stuff. Personally though, not a huge fan of his “the one” choice. Some of the other images in his assortment were much better imo. Each to their own of course though!

steve_adams_863y ago

sammalloy3y ago

> Pretty cool stuff. Personally though, not a huge fan of his “the one” choice. Some of the other images in his assortment were much better imo. Each to their own of course though!

https://en.wikipedia.org/wiki/Huichol_art

penthi3y ago

Big props to folks at replicate.Com for making solid infrastructure for ml.

https://www.ai-ink.me/

mlboss3y ago

Awesome work. I build an app to train dreambooth model and generate images of hich makes this process very easy.

The app also has a rest endpoint for anybody to create app using it. Lot of clients create niche websites catering to different use cases. There is a kind of gold rush going on in this area.

https://aipaintr.com

cto_official3y ago

Also how much $s were spent on the project?

mlsu3y ago

Borrible3y ago

Recently saw a nice replica of Duchamp's work 'Bottle Rack' from 1959. Readymade, but maybe a bit expensive. For the price they asked, I could do it myself in a blacksmithing class and have more fun.

hartator3y ago

Isn’t disappointing that nothing important is open source these days in AI?

cogitoergofutuo3y ago

Dreambooth implementation https://github.com/JoePenna/Dreambooth-Stable-Diffusion

AUTOMATIC1111

https://github.com/AUTOMATIC1111/stable-diffusion-webui

Stable Diffusion

https://github.com/Stability-AI/StableDiffusion

dezmou3y ago

I used Stable Diffusion Dreambooth to generate my github profil picture, what do you think ? https://github.com/dezmou

blitzar3y ago

This is the barely a full step from; "I used Stable Diffusion and Dreambooth to create nudes of a person I know".

Yes, They're Real and They're Spectacular.

true_religion3y ago

All the current stable diffusion commercial applications are in this format: take picture of subject & create a bunch of new portraits of subject.

For example:

- There's one that creates gaming avatars based on your picture - There's one that makes professional headshots for your CV

lxe3y ago

This is a great writeup on some of the nuances and gotchas you have to watch out for when finetuning using dreambooth and the generative creative process in general.

cto_official3y ago

It's impressive, The end result was beautiful. I always use to wonder how to generate some meaningful art

Any references where the same has been tried on humans ?

birdfood3y ago

I find it interesting that this green/orange colour palette so commonly appears in midjourney images, seemingly regardless of the subject.

cubefox3y ago

Not just Midourney, also Dall-E 3 often does it:

https://www.bing.com/images/create/a-bright-day-on-the-stree...

https://www.bing.com/images/create/christmas-on-board-a-spac...

https://www.bing.com/images/create/in-the-streets-of-atlanti...

cogitoergofutuo3y ago

This is really interesting. I do wish the author included the cost to train the model from replicate though.

tonmoy3y ago

If I wanted to do this, what kind of specs would I need to have on my Desktop Computer?

phonescreen_man3y ago

Is the link an AI generated tutorial? Write me a blog post tutorial pretend you are …

OOPMan3y ago

"art portrait" seems grammatically wrong...?

yieldcrv3y ago

Results at the top of your article/project please

renewiltord3y ago

Great work writing up the process. Much appreciated!

ronnykylin3y ago

The style looks Andy Warhol to me

EGreg3y ago

What are the tools we can run on a Linux machine?

EDIT: four downvotes and zero answers how to run it on a Linux machine…

minimaxir3y ago

You were likely downvoted because you asked how to use it for NFTs, which you just edited it out.

EGreg3y ago

I don’t see why that is relevant. Why is using it for NFTs worthy of a downvote?

cogitoergofutuo3y ago

The only piece of software mentioned in the article that doesn’t run on Linux is Draw Things.

bigbillheck3y ago

Personally I paid a friend $200 to create an art portrait of my dog.

mkoryak3y ago

Not all of us have friends or 400$

pxoe3y ago

the lengths techbros will go for in order to avoid paying an artist for artwork

dsign3y ago

> the lengths techbros will go for in order to avoid paying an artist for artwork

Hiring an artist removes control, it's not your art, but the artist's what ends up on the wall. That's the reason that I avoid hiring artists when I want some art made.

zirgs3y ago

One of my hobbies is 3D art. I love 3D sculpting, but hate texturing and rigging. This is why I'd like the AI to do the unpleasant parts.

1 more reply

hatefulmoron3y ago

why would he pay for an artist when he's happy with what he has? Why do drawcels feel so entitled to be looped in financially for no reason?

pxoe3y ago

they could do a step-by-step art tutorial, and wouldn't have to pay anybody, nor use tools that rely on stolen artwork. but nope.

highly ironic how they made this thing, and promptly showed it off to thousands of people on the internet, immediately invalidating your example

they could make art in a way that wouldn't participate in tech art theft racket, but they didn't. and they didn't just participate in it, but promote it and perpetuate it.

2 more replies

kstenerud3y ago

This sounds a lot like wagoners complaining about how these newfangled automobiles are profiting off their wagon designs, and not compensating them for it.

challengedchip3y ago

1 more reply

cztomsik3y ago

feeling better?

j / k navigate · click thread line to collapse