Just like all examples of generative "AI" I've seen, there's always some bit of uncanny valley vibe present. In the audio examples, there's always this weird distortion like a really poorly compression sources were used as training data. The sounds are muddled together, and rarely do I hear clean musical voices. It's just a smear of sounds coming together that our brains try really hard to say "oh, that's a _____" situation. While the samples in the TFA are probably the closest I've heard to date, the issue is still present.
I guess the thing that strikes me so odd about the generative thing is all of the press releases on people presenting things like it's a final product, yet it's clearly pre-release beta at best but more likely alpha versions of code in the results in quality. If a non-AI product released something that was so clearly not finished, it would be panned to no end for not working.