They're certainly capable of remixing things they've seen, and adding in randomness will add novelty. Whether that counts as "creativity" is something people can debate : - )
I think that the reason image ones have caught on better in some ways is because they don't need to be accurate. We're not asking them to understand anything, just produce images based on text prompts (which is amazing stuff all by itself).