undefined | Better HN

0 pointslondons_explore4y ago0 comments

Figure A.4 in the linked paper is a good high level overview of this model. Shame it was hidden away on page 19 in the appendix!

Each box you see there has a section in the paper explaining it in more detail.

0 comments

1 comments · 1 top-level

hn_throwaway_994y ago

Uhh, yeah, I'm going to need much more of an ELI5 than that! Looking at Figure A.4, I understand (again, at a very high-level) the first step of "Frozen Text Encoder", and I have a decent understanding of the upsampling techniques used in the last 2 diffusion model steps, but the middle "Text-to-Image Diffusion Model" step that magically outputs a 64x64 pixel image of an actual golden retriever wearing an actual blue checkered beret and red-dotted turtleneck is where I go "WTF??".

2 more replies

j / k navigate · click thread line to collapse