It's still crap. The french lab tcts[1] a few years ago had a speech engine which added tongue flicks, breathing and lip noises (Don't know the technical name for these) as well as using multiple forms of a sound to generate words. IE: match up the correct version of each letter to say "match" rather than force one form to work.
This is still short trousers for text to speech. But then text to speech isn't a big selling feature for computers - yet. Think mini, data specific watsons on your iphone. Pretty soon we'll have HAL-9000's in our pockets.
1. http://tcts.fpms.ac.be/