I think this serves at least as a clear demonstration of how advanced the current state of AI is. I had played with GPT-3 and that was very impressive but I couldn't even dream something as good as D-ALLE 2 was already possible.
Big pretrained models are good enough now that we can pipe them together in really cool ways and our representations of text and images seem to capture what we “mean.”