I am specifically referring to the flamingo example: "DALL·E 2 can make realistic edits to existing images from a natural language caption."
You provide the background image and a text prompt and it doodles on top of the image you provided as per their demonstration. I wasn't referring to the other examples down the page where it conjures up a brand new image from scratch based on your image input.
It is great that you can tell it to add a flamingo and it fits into the background you provide nicely due to the well tuned style transfer. That part is cool. And it is impressive that sometimes the flamingo it adds is reflected in the water. But sometimes it isn't reflected. And it isn't up to you, it is up to it. And you can't tell it to add a reflection as a discrete step.
Look more carefully. This is more akin to a clipart finder, except if the clipart doesn't exist it uses the most similar thing in its training set to what it guesses you want as a starting point to synthesize new clipart from.
It doesn't add it in like an artist would and you can't control it at all. I don't know how to better express this.
This isn't unimpressive or un-useful but not quite as mind blowing on second glance.