On a small scale, you have to professionalize ComfyUI’s development. My PR to make it installable and to make a plugin ecosystem that makes sense should not be sitting unmerged (https://github.com/comfyanonymous/ComfyUI/pull/298).
On a medium scale, CLIP is holding you back. I would eagerly buy a 48GB card to accommodate a batch size 1, gradient checkpointed LoRA-trainable model with T5 for conditioning. I want PixArt-a or DeepFloyd/IF with the SDXL dataset and training. I get I can achieve so much with SDXL on 24GB, including just barely a fine tuning, I understand the engineering decisions here, but it’s too weak on prompts.
On a large scale, I’m willing to spend a little money up front. In those conditions you can be far more innovative, you don’t have to make everything for $0. Shane Carruth didn’t make Primer for $0. I’m sure you’ve seen this movie, you get how astoundingly good it is. But he still spent something. He spent only slightly more than an RTX 6000 Ada.
Innovators have budgets. It’s still worth releasing the most powerful possible model for expensive hardware, this is why everyone is talking about Mixtral, but it’s especially true of visual art.