I'm also using Voxtral TTS to try to replace OpenAI. It "works", but I've had problems with volume levels being radically different between different audio chunks. It doesn't seem to "understand the full text" the way OpenAI's voice models do, which can be more expressive. Voxtral sometimes sounds robotic in the reading. And some Voxtral TTS output contains music in the background occasionally, which suggests their training corpus isn't that clean. Try generating a personalized news podcast, and the intro may occasionally sound like the music for BBC News underneath....
As for not focusing on AI, there's this interview in the Big Technology Podcast 2 months ago, where the Mistral CEO says their main focus is on helping companies fine-train models for internal use, over being a general model builder.