Also, I feel like shipping to the cloud and back has been shown to be just as fast as on device transcription in a lot of scenarios. Doing it on device is primarily a benefit for privacy and offline, not necessarily latency. (Although, increasingly powerful smartphone hardware is starting to give the latency edge to local processing.)
Siri's dictation has had such terrible accuracy for me (an American English speaker without a particularly strong regional accent) and everyone else I know for so many years that it is just a joke in my family. Google and Microsoft have much higher accuracy in their models. The bar is so low for Siri that I automatically wonder how much Whisper is beating Siri in accuracy... because I assume it has to be better than that.
I really wish there was an easy demo for Whisper that I could try out.