1st prompt: https://i.postimg.cc/T3nZ9bQy/1st.png
2nd prompt: https://i.postimg.cc/XNFm3dSs/2nd.png
3rd prompt: https://i.postimg.cc/c1bCyqWR/3rd.png
This is using some of the popular prompts you can find on sites like prompthero that show amazing examples.
It’s been serious expectation vs. reality disappointment for me and so I just pay the MidJourney or DALL-E fees.
In a nutshell:
1. Use a good checkpoint. Vanilla stable diffusion is relatively bad. There are plenty of good ones on civitai. Here's mine: https://civitai.com/models/94176
2. Use a good negative prompt with good textual inversions. (e.g. "ng_deepnegative_v1_75t", "verybadimagenegative_v1.3", etc.; you can download those from civitai too) Even if you have a good checkpoint this is essential to get good results.
3. Use a better sampling method instead of the default one. (e.g. I like to use "DPM++ SDE Karras")
There are more tricks to get even better output (e.g. controlnet is amazing), but these are the basics.
You can finetune it on your own material, or choose one of the hundreds of public finetuned models. You can guide it in a precise manner with a sketch or by extracting a pose from a photo using controlnets or any other method. You can influence the colors. You can explicitly separate prompt parts so the tokens don't leak into each other. You can use it as a photobashing tool with a plugin to popular image editing software. Things like ComfyUI enable extremely complicated pipelines as well. etc etc etc
For all the promise of control and customization SD boasts, Midjourney beats it hands down in sheer quality. There's a reason like 99% of ai art comic creators stick to Midjourney despite the control handicap.
But yes SD can be a bit of a pain to use. Think of it like this. SD = Linux, Midjourney = Windows/MacOS. SD is more powerful and user controllable but that also means it has a steeper learning curve.
We could just as easily say "hosting your own email can be set up in a few minutes if you know what you're doing". I could do that, but I couldn't get local SD to generate comparable images if my life depended on it.
screenshot of the options interface: https://stash.cass.xyz/drawthings-1687292611.png
Here, I've uploaded it to civitai: https://civitai.com/models/94176
There are plenty of other good models too though.
Which is fair enough, when you are a (relatively) small company competing with the likes of Google and Meta you really need to focus.
It's like each of these has a hidden giant pile of negative prompts, or additional positive prompts, that greatly narrow down the range of output. There are contexts where the Dall-E 'spoopy haunted house ooooo!' imagery would be exactly right… like 'show me halloweeny stock art'.
That haunted house prompt didn't explicitly SAY 'oh, also make it look like it's a photo out of a movie and make it look fantastic'. But something in the more 'competitive' AIs knew to go for that. So if you wanted to go for the spoopy cheesey 'collective unconscious' imagery, would you have to force the more sophisticated AIs to go against their hidden requirements?
Mind you if you added 'halloween postcard from out of a cheesey old store' and suddenly the other ones were doing that vibe six times better, I'd immediately concede they were in fact that much smarter. I've seen that before, too, in different Stable Diffusion models. I'm just saying that the consistency of output in the 'smarter' ones can also represent a thumb on the scale.
They've got to compete by looking sophisticated, so the 'Greg Rutkowskification' effect will kick in: you show off by picking a flashy style to depict rather than going for something equally valid, but less commercial.
Can't wait to have something like StableDiffusion but for LLMs.
If stable diffusion didn’t launch Dall-e 2 would have been still valuable.
If I create a Mickey Mouse using photoshop would adobe be liable for it?
Regarding image generation in Photoshop I can confirm two things:
- It is excellent for in and out painting with a few exceptions*
- It remains poor for generating a brand new image
*Photoshop's generative fill is very good at extending landscapes, it will match lighting and according to the release video can be smart enough to observe what a reflection should contain even if that is not specifically included in the image (in their launch demo they showed how a reflection pool captured the underside of a vehicle.)
Where generative fill falls apart: Inserting new objects that are not well defined produces problems. Choosing something like a VW Beetle will produce a good result as it is well defined, choosing something like "boat", "dragon", or even "pirate's chest": will produce a range of images that do not necessarily fit the scene - this is likely because source imagery for such objects is likely vague and prone to different representations.
1st note about Firefly: Anything that is likely to produce a spherical looking shape tends to be blocked - likely because it resembles certain human anatomy. This is problematic when doing small touch ups such as fixing fingers.
A special note about photoshop versus other systems: Photoshop has the added problem of needing to match the resolution of the source material. Currently it achieves this from combining upscaling with resizing - this means that if one is extending an area with high detail, that detail cannot be maintained and instead is softer/blurrier than the original sections. It also means that if one extends directly from the border of an image, then a feathered edge becomes visible which must be corrected by hand.
I currently test the following AI generators, feel free to ask me about any of these: StableDiffusion (Automatic and InvokeAI), OpenAI's Dall-E 2, MidJourney, Stability AI's DreamStudio, and Adobe Firefly.
update: I've edited the post to include these results as well
Generating a “word bubble” is going to look terrible in every major diffusion model. Cohesive words and writing in image models is still highly specialised.
Most image AI tools are terrible with words.
I am curious, what images did you try generating with midjourney?
And the inevitable booby cheesy rendered forest fairy.
I don't think they're terrible at all. They absolutely can make original art with decent production values.
They can't write text yet, but I'm sure that's coming soon.
Seems clear to me that Midjourney has by far the best "vibes" understanding. Most models get the items right but not the lighting. Firefly seems focused on realism which makes sense for a photography audience.
https://twitter.com/fanahova/status/1639325389955952640?s=46...
If you want to play around with OpenJourney (or any other fine-tuned StableDiffusion model). I made my own UI with a free tier at https://happyaccidents.ai/.
It supports all open-sourced fine-tuned models & loras and I recently added ControlNet.
The flaw with these comparisons is that you really shouldn't use the same prompt with different generators. If you want to get best results you do have to play with the prompts and do a bunch of iteration to kind of explore the latent space and find what you're looking for. The first super long prompt looks like it's tuned for stable diffusion for instance. Different generators also have different syntax (e.g. with stable diffusion you can surround a phrase with parens to give it extra emphasis).
Generally, this model is much better than Dall-E 2, and it beats Firefly in some areas (I didn't try Midjourney or Stable Diffusion). Firefly usually produces photos with significantly fewer visual mistakes (like the wrong number of fingers or messed up faces) than the Bing Dall-E. But the latter usually understands prompts much better and more often produces something that matches it well. Firefly also doesn't "know" a lot of pop culture or history things, e.g. Marilyn Monroe, or what Coca-Cola is.
I used stable-diffusion-xl-beta-v2-2-2 model, copypasted prompts from the blog post, one-shot for each prompt. I chose style presets that closely matched the prompt (added as suffixes in image filenames).
Literally all of the examples have floor to ceiling windows across the entire length of the wall…