> Our new text-to-image model, DALL·E 3, can translate nuanced requests into extremely detailed and accurate images
Stable Diffusion (XL) already has many prompt expansion implementations. Some are straight up Llama finetunes, others are small models just for this purpose, and I think there was at least one research paper more directly hooking up an LLM into existing diffusion models.
...Ironically, the problem with SD is accessibility. There are hundreds of little barebones paid APIs/websites that don't implement the augmentations that make SD powerful, and there are a handful of janky, understaffed and tragically underfunded community projects that implement subsets of these reasonably well, that you need to be a Python dev with a GPU to figure out how to install and bugfix them.