I think you have it exactly backwards - the iPhone 4S/Siri speech-to-text/natural language processing are done in the CLOUD. The text-to-speech is done on the phone itself. My (non-Siri of course) iPhone 4's Voice Command stuff is COMPLETELY on the phone itself, and would do TtS of my contact list and Artist names, etc.
The article says, "The iPhone 4S really sends raw audio data". At least for Siri, TtS occurs on the cloud - not sure where the text processing > API occurs though.
Sending raw data and _receiving_ raw data are NOT the same thing. It's been clear that the iPhone 4S sends raw data to the cloud, based on people sniffing the network shortly after release.