Are you handling the speech-to-text, translation, and voice synthesis as separate steps or is it more of an end-to-end model? Curious how you deal with things like pacing and intonation that don't always carry over between languages.
No comments yet.