> Dalle 1 works thx to the autoregressive model (also no GAN)
It uses an autoregressive model to predict codes for a pretrained VQGAN, doesn't it?
Doesn't Stable Diffusion's autoencoder also use an adversarial loss? Otherwise wouldn't it suffer the typical blurring problems well known to MSE?