> The rule in ML
There's definitely attempts to revive things (in the general sense, not just GANs), but most successes appear to come from large labs dedicating equal computational resources to the older models and often by changing names. This can make things more confusing and make things appear to be changing faster, but once you can see this, you'll have an easier time keeping up (so being new, watch out for this). I'll give some examples that are easier to read[0-2] (i.e. don't need expert knowledge to understand the nuances).
As an insider (ML researcher), my complaint isn't so much about that we have a large proportion of people chasing one specific avenue, it is that we gate keep newer methods. I think you can see a similar effect on HN when new models are proposed. They are trashed due to lack of beating existing models (this is true even beyond ML!). There will always be reasons to critique works and I don't want discourage criticism, but I do want to discourage dismissal. It hinders progress, because progress is made in small steps, not leaps and bounds. I think this can get confusing for someone entering the field (I'm sorry if I've misjudged, I'm inferring from the comments).
> On the FFHQ point
This is an excellent question that unfortunately I don't know the answer to. I think you'll find this work helpful[3], it has the largest human study. But despite its name, StyleNAT performed best in specifically FFHQ. What I would say is that there are good arguments to make the diffusion models are better at representing a diversity of images (making them well suited for things like art generation) but theoretically GANs are approximating your full density distribution. There's some talks by Goodfellow discussing this but I don't recall which ones.
> I didnt know they wrote custom kernels
As you've probably found, the StyleGAN code is not the easiest to read lol. Since you're using pytorch, you can find them here[4]. I encourage you to look at these, especially if you've never seen CUDA code before. Because the biggest takeaway will be that you'll see how easy it is to add a custom kernel, and given the earlier comment you'll see the utility ^__^
I'm not going to discourage you from trying triton, but I'll note that pytorch's compile goes a long way. Definitely _start_ there (see TensorRT).
> What does it mean to have a backbone? Does it just mean the underlying architecture used in the method?
Exactly! So in the case of a diffusion model it is the UNet (the neural network part, and specifically this part estimates the parameters for the probability distribution. If this doesn't make sense now, it will later. If you are struggling to understand diffusion models after spending some time reading the papers, come back to this comment). You'll also find the term "backbone" used in application based models such as in Semantic Segmentation, Object Detection, Pose Estimation, and much more. In those cases, these are typically pretrained, so recognize this as a hyper-parameter.
> Also, on the decoder only vs encoder-decoder point
I'm going to say something frustrating. In short: yes. If we get a but more nuanced: no. If we get really nuanced: yes. I know this isn't a great answer, but it can be really difficult to understand. On the surface, yes because you need to encode the variable and your model needs to transform a dimension starting at R^N and ending in R^N. While a (I have to stress, colloquial[5]) GAN transforms a R^M space to R^N where M << N. With more nuance you can argue it is the backbone. But to be detailed, you'll find that there are fundamental factors placing computational bounds on the theoretical performance of these architectures. To get there you'll need to carefully study Goodfellow's original paper (some follow-ups expanding on the analysis) as well as the "original" diffusion paper by Sohl-Dickstein[6] (quotes because this is debatable, but the claim has reasonable merit), and you should become familiar with Aapo Hyvärinen[7]. The last is by far the hardest part and the confusion is normal. I know quite a few well renowned and intelligent people who struggled (personally I went through the "this is hard", "this is easy", "ah fuck, I actually don't understand anything" cycle for a bit. But that's just a signal that you're progressing :).
> Thanks for the detailed comment, you've given me a lot to think about.
Great! I hope this can help provide direction (I read your other post). The best and worth thing about machine learning is that there's so much depth. It can both be intimidating and easy to miss. But if you're passionate about learning (as it appears) you'll find the knowledge gained is highly rewarding, if unfortunately hard to gain.
And I apologize for being so verbose. It is a bad habit.
[0] Diffusion Models Beat GANs on Image Synthesis (https://arxiv.org/abs/2105.05233) The two authors are rockstars. Given your blog post, I think you'll enjoy a lot of their work. Which includes diffusion.
[1] ResNet strikes back: An improved training procedure in timm (https://arxiv.org/abs/2110.00476) Again, all three are great people to follow. You won't see many papers by Wightman, but you'll see his work with (now) Hugging Face. Notably he's one of the most important players in ViTs.
[2] A ConvNet for the 2020s (https://arxiv.org/abs/2201.03545) and ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders (https://arxiv.org/abs/2301.00808)
[3] Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models (https://arxiv.org/abs/2306.04675)
[4] https://github.com/NVlabs/stylegan2-ada-pytorch/tree/main/to...
[5] I'm sorry, I still have difficulties explaining this, especially simply. There's a lot of points here. But one is easy to understand and what I mentioned before: GAN is a training method, not an architecture. A bit more nuance will be found by reading this far underappreciated work: https://arxiv.org/abs/1912.03263. The last point I want to mention is to never forget that "generative" is a general term and these models are good for generating __data__. Images are data, but to think this is the only type of data a GAN (or any model. I literally mean any[3]) can generate is naive. All of this gets harder to explain and I don't have the skill to do so in a simple manner and am afraid it will just come off as a rant.
[6] Deep Unsupervised Learning using Nonequilibrium Thermodynamics (https://arxiv.org/abs/1503.03585)
[7] "Score matching" and "Noise-contrastive estimation" will be the most beneficial https://www.cs.helsinki.fi/u/ahyvarin/papers/