Would you rather train text-to-speech technology based on conversations your television eavesdropped on, or with a collaboratively-built open database of voluntarily submitted content?
This is the perfect exit for precisely the problem you're pointing at: if the data is open it doesn't need to be massively harvested to make this work, and you can run the text-to-speech engine entirely offline on the device.