Yeah, I know what you meant. You use a tool like the CMU pronunciation dictionary[1] to turn words into phonemes, and then you use a model similar to the pink trombone to turn the phoneme string into sound, including the transitions between different phones (which, it turns out, actually matter more than the phones themselves for making it understandable). This is how TTS works.
1 http://www.speech.cs.cmu.edu/cgi-bin/cmudict