undefined | Better HN

0 pointslaszbalo3mo ago0 comments

Not sure about macOS or Windows, but on Linux Firefox uses speech-dispatcher, which is a server, and Firefox is the client. Speech-dispatcher then delegates the text to the correct TTS backend. It basically runs a shell command, either sending the text to a TTS HTTP server using curl, or piping it to the standard input of a TTS binary.

Speech-dispatcher commonly uses espeak-ng, which sounds robotic but is reportedly better for visually impaired users, because at higher speeds it is still intelligible. This allows visually impaired users to hear UI labels more quickly. For non visually impaired users, we generally want natural sounding voices and to use TTS in the same way we would listen to podcasts or a bedtime story.

With this system, users are in full control and can swap TTS models easily. If a model is shipped and, two weeks later, a smaller, newer, or better one appears, their work would become obsolete very quickly.

0 comments

Barbing3mo ago

Fascinating. Might be part of why I’ve seen some folks have such love for old voices like Fred.

j / k navigate · click thread line to collapse