Seriously though, there are some minor hand issues and a rare missing body part. "Correct anatomy, no missing body parts." seems to fix it mostly. Still pretty good for an early 0.1 announcement.
Following full sentences is pretty good. Although this: "A photo of a table. On the table there's a green box on the right, a red ball on the left. There's a yellow cone on the box." keeps putting the cone on the table.
Not trained on naked bodies though - generates blob monsters instead.
Try "ramen without egg" or "ramen with no egg" and it will show ramen WITH egg.
Or "man without striped shirt" will give "man WITH striped shirt"
Set the seed to 0 and prompt to "man in a loud shirt" - you get flowers. Sweet the negative prompt to "floral shirt" - no not flowers.
Sentence processors can definitely understand negation, (any non-trivial LLM can) but it would be a waste of time to train that in the image generators -vs- making other ideas better.
> That’s what negative prompt is for.
This is what I mean by it "not understanding negations" You need whole separate prompt, just to say you want e.g. "ramen without egg" instead of just saying it in a single prompt that it understands.
AIs are able to understand negations, just ask an LLM a question. Text-to-image models are the ones that struggle the most with this, they usually do not have a very nuanced understanding of text.
> "a cat that is half orange tabby and half black, split down the middle. Holding a martini glass with a ball of yarn in it. He has a monocle on his left eye, and a blue top hat, art nouveau style "
Plus an image that somewhat resembles that prompt. The cat has a human-like hand with a chopped off thumb and 6 fingers in total, differently colored eyes, a branch in front of its face, the ball of yarn is somehow floating in mid-air.]
- prompt adherence is really good
- it's somewhere between SD15 and SDXL at creating pictures of text
- aesthetic quality is good, but leaves some to be desired
Gonna play more with it in ComfyUI.It’s a difficult prompt. Nobody gets the grouping of black keys right. Maybe someday?
Here is your model, complainers.
I'm not really sure why you'd be so insistent on that, as opposed to just fine tuning the "totally not open source, but instead just open weights" models.
But go ahead, I guess.
Now we can get back to talking about capabilities, usage, and results, as opposed to arguing about the definition of words.