* Regarding the VPN: Inevitably it adds an extra hop but with a VPN that provides sufficient bandwidth, low latency, and sufficient processing resources to decrypt/encrypt at wire speed -- I assume that either it will cost extra or will be something the user has to setup at a good hosting provider -- couldn't performance be sufficient for voice? IIRC, from long ago, voice needs ~80 Kbps.
* Regarding cellular data: Cellular connections are very widely used for voice, of course. Cellular data connections are used now (e.g., VoLTE). On one hand I'd have the same doubts you do; on the other it seems to work. Aren't there already VOIP apps?