I've been toying with something similar (mostly on paper so far) involving custom GStreamer elements for voice recognition/synthesis, a Wit.ai-like intent resolver for command & control, and ChatScript-like pattern matching for conversational dialogue. My goal is to put it all behind a WebRTC and SIP gateway and have a low-latency personal assistant that I can access from virtually any device (even an old landline telephone) and that runs on my own private server. That's the dream anyway. I'm stuck on the voice synthesizer, at the moment...