undefined | Better HN

0 pointsderac2y ago0 comments

It's one model with text/audio/image input and output.

0 comments

1 comments · 1 top-level

Very exciting, would love to read more about how the architecture of the image generation works. Is it still a diffusion model that has been integrated with a transformer somehow, or an entirely new architecture that is not diffusion based?

j / k navigate · click thread line to collapse