undefined | Better HN

0 pointsImnimo3y ago0 comments

I meant to say that Dall-E 2's approach is closer to "teaching a three year old to paint" than the alternative methods. Instead of trying to maximize agreement to a text embedding like other methods, Dall-E 2 first predicts an image embedding (very roughly analogous to envisioning what you're going to draw before you start laying down paint), and then the decoder knows how to go from an embedding to an image (very roughly analogous to "knowing how to paint"). This is in contrast to approaches which operate by repeatedly querying "does this look like the text prompt?" as they refine the image (roughly analogous to not really knowing how to paint, but having a critic who tells you if you're getting warmer or colder).

0 comments

astrange3y ago

Well, original DALL-E also worked this way. The reason the open source models use searches is that OpenAI didn't release DALL-E, but only another project called CLIP they used to sort DALL-E output by quality. It turns out CLIP could be adapted to produce images too if you used it to drive a GAN.

There is a DALL-E model available now from another company and you can use it directly (mini-DALLE or ruDALL-E), but its vocabulary is small and it can't do faces for privacy reasons.

j / k navigate · click thread line to collapse

0 pointsImnimo3y ago0 comments

0 comments

astrange3y ago

There is a DALL-E model available now from another company and you can use it directly (mini-DALLE or ruDALL-E), but its vocabulary is small and it can't do faces for privacy reasons.

j / k navigate · click thread line to collapse