It's a standardised sample, already correlated to text, close to the microphone, for one thing. You're just making it easier for them.
I mean I suppose you can use "like and subscribe", "without further ado", and "let's get started" as standardised samples if you want to catch a youtuber.
But AFAIK my voice isn't on the internet anywhere. Quite a lot of people are not.
There's a number of ways this information can be connected back, with varying precision, to the person who recorded it.
And we should have learned from the Cambridge Analytica scandal that data is used in ways we do not expect. For example, what if you don't care to reproduce someone's voice, but you do care to extract age/gender/racial background/sexual orientation from it?