Excited to be able to share the release of `InvokeAI 2.0 - A Stable Diffusion Toolkit`, an open source project that aims to provide both enthusiasts and professionals a suite of robust image creation tools. Optimized for efficiency, InvokeAI needs only ~3.5GB of VRAM to generate a 512x768 image (and less for smaller images), and is compatible with Windows/Linux/Mac (M1 & M2).
InvokeAI was one of the earliest forks off of the core CompVis repo (formerly lstein/stable-diffusion), and recently evolved into a full-fledged community driven and open source stable diffusion toolkit titled InvokeAI. The new version of the tool introduces an entirely new WebUI Front-end with a Desktop mode, and an optimized back-end server that can be interacted with via CLI or extended with your own fork.
This version of the app improves in-app workflows leveraging GFPGAN and Codeformer for face restoration, and RealESRGAN upscaling - Additionally, the CLI also supports a large variety of features: - Inpainting - Outpainting - Prompt Unconditioning - Textual Inversion - Improved Quality for Hi-Resolution Images (Embiggen, Hi-res Fixes, etc.) - And more...
Future updates planned included UI driven outpainting/inpainting, robust Cross Attention support, and an advanced node workflow for automating and sharing your workflows with the community.
We're excited by the release, and about the future of democratizing the ability to create. Check out the repo (https://github.com/invoke-ai/InvokeAI) to get started, and join us on Discord (https://discord.gg/ZmtBAhwWhy)!
please let me know/send PRs if i missed anything, its been a couple months so i'm overdue for a round of cleanup/reorganizing
Currently expanding it a lot with some fun features:
- multi-gpu support (first UI that would support that I think)
- no-click installer (installs everything when you start it up) that works on Windows, Linux and macOS
- A cloud version where you can "rent" access to the UI + very powerful GPU instances without having to run anything locally yourself.
Been waiting to submitting it all to HN as a Show HN but have to wait a bit for everything to get into place first :)
And in communities, you can probably add https://stablehorde.net
We've got a GRPC server that's backwards compatible with the official grpc.stability.ai but adds advanced features, a Flutter based infinite canvas web client in heavy development, a Discord server and a Krita & Photoshop plugin (all except the last are open source under various licenses).
The outpainting the server supports is (IMHO) the best I've seen, and recently got some a tiny glimpse of attention when combined with the Photoshop plugin - https://twitter.com/NicolayMausz/status/1577767106384433156
am reworking my readme now with all the updates
Here is a video of some of its features: https://www.youtube.com/watch?v=XLOhizAnSfQ&t=1s
OG dreambooth is proprietary Google code. what everyone's using is a third party replication of it using SD
In a way it reminds me of people who make unofficial remakes of games but get cease and desists if they show gameplay while in development. The correct move is to fully develop the game and release it, then if you get C&Ds, too late, the game is already available to download.
The slowdown has numerous issues. They got legal threats, death threats, and threats from some congresswoman to have them banned by the NSA (1).
Stability.ai workers (except for one) have a clause that they can open-source anything they're working on. They do and supposedly will open-source everything because they want to do a ecosystem, not a cash grab in the model of DALL-E.
Also they don't have one central place for all their projects and will scale from 100 to 250 employees in the following year so things should speed up.
1) https://eshoo.house.gov/media/press-releases/eshoo-urges-nsa...
Note that this NSA is the National Security Advisor, not the National Security Agency.
> No actually dev decision. Generative models are complex to release responsibly and team still working on release guidelines as they get much better, 1.5 is only a marginal FID improvement.
When these companies and OS projects create "responsible" or restricted AIs they are at the same time creating a demand for AIs that have no limitations and eventually open source or even commercial AIs will respond to this demand.
I hope while they still play this "responsible" game, they are at least using the time to figure out ways how we can live with this kind of advanced AI in a future where everything is fake/false by default.
Single source “massive models” may be more difficult to get out, but Emad said they’re working in licensing a ton of content to train future models. Even then, anyone can train new models now - The output from Dreambooth and Textual Inversion are already impressive, and seem like just the beginning.
Going to be an interesting road ahead.
Are there known ways the training for these models could be distributed/decomposed? E.g. SETI-style distribution of a homogenous centrally-defined task, or — much more exciting — recombination of several different models / sets of weights? (I'm just throwing words around here without really understanding them.)
I'm imaging a world in which one group of enthusiasts could work together to train a model on all images on Wikipedia, another group could work on training a model that understands hands really well, and then later yet another group could combine the work of the other two without doing all that training from scratch.
Is that even remotely plausible?
Sort of hinted at it upthread, but would be interesting if this eventually brings competition to the GPU compute space (AMD, Intel?) .
I think Invoke is competitive for now, but biggest advantage is an improved UX, and a large community with an ambitious roadmap focused more on enthusiasts/pros.
I’d give it a whirl and see where you end up preferring to do your SD projects :)
- better inpainting
- a gallery to easily compare generations and easily regenerate images with small modifications
- responsive design --> works great on mobile, swipe left/right to switch between pictures in same generation and up/down to switch to another generation to compare
Here is the repo: https://github.com/leszekhanusz/diffusion-ui
Also if you don't have the hardware, you can also get images for free using the Stable Horde (https://stablehorde.net), a cluster of backends provided for free by volunteers.
You can test it here: https://diffusionui.com/b/stable_horde
The company whose code was stolen works closely with the man behind SD and the decision was made to merely ban him from the community instead of torpedo-ing the repo via DMCA.
The bees knees would be being able to use automatic as a CLI or with some programmatic interface, because it is more feature rich. But I haven't seen anything that allows me to do that yet, so I'm stuck to its clunky UI or to use invoke-ai.
* Every single one of these seems to be a web UI, when this is desktop software that needs a desktop computer or workstation to run. Have we all collectively forgotten how to program PyGTK?
* Model files always go in the code repo. Have we forgotten how home folders work or what their purpose is? At the very least this one instructs you to make a shortcut/symlink if you don't want to copy the ckpt file yet again
* On that note, everything is autodownloaded to wherever the hell the programmer wants (once again, usually in the code repo itself). I must have four or five different copies of ESRGAN, and I spent a bunch of time monkeying around with automatic1111's fork trying to get it to correctly see everything when I ripped out the models folder and symlinked one in from a different place on my hard drive.
To the authors: can you all please get together and standardize some of this stuff? Models should go in user's homefolders, or at a customizable location, and NOT within the scope of stuff that can be touched by git pull. (Doing so causes git to freak out in many circumstances)
The breakneck pace of innovation here is awesome, but it feels like all gas no brakes on the usability front.
In the Bad Old Days(tm) you ran an install script which generates a desktop icon and you click that to run it. Meanwhile with this, on Windows, one has to open an anaconda prompt, activate the anaconda venv (or whatever it is), then manually invoke the whole thing with 'python scripts/invoke.py --web'. And if there's a one-click install script included (which invoke doesn't, but I am not knocking it for this!), half the time they seem to try and pull down the entire world all over again (a la sd-webui).
Like I get this need to make it easy to use, but it's like c'mon, there's is existing convention for all these things. Folks, please follow it!
If I had a wishlist, or the wherewithal to fork my own version, it would have:
* an actual GUI made with an actual windowing toolkit. I don't know why the hell everyone is so afraid of GTK, but I would use that. pyGTK is pretty simple IME, you can even read the C++ docs and it all maps over really nice to python. It doesn't need to be pretty!
* configurable model locations, preferably in an agreed-upon standardized hierarchy
* a standardized way of embedding prompt data into the PNG, a la automatic1111
* an uncomplicated but not overly optimistic setup process. An install.py and run.py, both with sensible defaults so that you don't need any command-line switches to run it except for special circumstances, and if it wants to autodownload updates then CHECK WITH ME FIRST! And preferably one that doesn't try to move my entire world (heres looking at you, sd-webui). And it will load the venv/conda environment for me.
And yes, for all the "put your money where your mouth is", I've been thinking about forking. But I don't know if I have the time or energy to keep up with all the developments in this space. But hey you never know...
Once the pace of innovation slows down I'm sure we'll see more effort from people with traditional software engineering experience come in to clean things up.
- 1 Click Install & Run is in the works to make this easier to install. We agree.
- Model locations are a valid point - We're working on being able to hot swap models mid-session, so I'll bring this up into the convo.
- Invoke has aligned on its own metadata structure for the ability to easily pull those parameters into future invocations. We're not worried about compatibility with Automatic.
No need to fork - Just join us on discord and complain loudly until we make things better. :)
The devs recently added img2img support too.
I had tried to install the more complete versions but I just end up in a wormhole of python and conda errors.
It's probably worth following that.
https://github.com/bes-dev/stable_diffusion.openvino
Thanks!
For example, it's not possible to run SD on my 2 year old 16 intel MacBook Pro. This is because PyTorch doesn't have support for the slightly older AMD gpu on board. There's a newer framework called RocM for AMD cards that allows them to work with recent versions of PyTorch.
Given all that, the requirements to have a Nvidia card is entirely acceptable, and for the most part a technical requirement.
Ironically, cpu support would be faster for me (in terms of throughput, at least) because I have on the order of a thousand zen cores put only a couple CUDA compatible GPUs with enough ram to run SD.
The project I linked to does it, so it's clearly possible. I didn't ask for speed, I only ask to be able to run it at all.
I’m using Ubuntu(to follow what AMD has for ROCm) and building the entirety of (gfx803 patched) ROCm from source.
It works with some forks but not others.
You can try it out:
Care to shine some light on it? Using something like runpod/vast.ai would be my guess?
I updated the link you mentioned with some of this info!
I couldn't get it to work following https://invoke-ai.github.io/InvokeAI/installation/INSTALL_WI...
Similar to https://github.com/cmdr2/stable-diffusion-ui/releases/tag/v2...
I'd be happy to submit a PR to your project, if you're interested in using it. I actually got it working with your project a few weeks ago, so I know it works with your repo.
I've opened a github issue as well, so we can talk there if you'd like: https://github.com/invoke-ai/InvokeAI/issues/1042
File an issue of it’s not working for you; it’s working fine for me.
(See for example https://github.com/invoke-ai/InvokeAI/issues/1021 ; if you had a previous install, delete it entirely)
I feel like there's just so much to improve though. Maybe SD is the definitive proof that one single feature can trickle down into many others just by adding good UI on top of it.
I primarily used the InvokeAI release because I found it was easy to get going with on Linux and then it was simple enough to hack around with.
Also the first tool I've ever used where I've rode on the ragged edge of what my 3070 is okay with. I've had graphical glitches due to occupying all the video memory (KDE doesn't like it). I've had to quit apps to make it work.
Thanks for making a useful thing of all this Stable Diffusion stuff. I've enjoyed it.
So how convincing are these solutions in the worst case is what i am asking.
Vs DiffusionBee which just works
Maybe the two projects can merge?