If the application is still using the deprecated Microsoft Speech API (SAPI), it's being done locally, but that API hasn't received updates in like a decade and the output is considerably lower quality than what people expect to hear today.
Firefox on Windows is one such application that still uses SAPI. I don't know what uses does on other operating systems. Like, on Android, I imagine it uses whatever is the built-in OS TTS API, which likely goes through Google Cloud.
But anything that sounds at all natural, from any of the OS or browser vendors, is going through some cloud TTS API now.