Arguably, a model created by training on a corpus of data is a derived work of that corpus.
Let's say I take a collection of images and use a program to compress them. When decompressed, the images are close to, but not exactly the same as the originals. Despite being in a different format, and despite not being exactly the same as the originals, the copyright to the compressed images is still held by whoever previously held it.
If I take the collection of images from earlier and train a diffusion model based on it, I'm essentially just compressing it a different way. With the right prompt, you can get out something very similar to what you put in.