By non-verbal do you mean like ambient sound? Dogs barking, child yelling, garbage truck garbage trucking? I don't know. If they can do voice, then it might be possible to do ambient sounds of there is a separate nets trained with a library of ambient sounds where it's tuned not to be the same every time the sound plays like how when you have tiled graphics, there are algorithms that remove the unnatural sameness from one tile to the next.
This could have interesting implications for Foley-artists of the 21st century.
How likely would such a tech help lower budget companies who want to implement voice communication within their software, say for video games or similar?
Hmm, now this has me wondering what implications this has for voice acting as well.
EDIT: We can call the ambient sound symbols sent over the wire "Soundmojis" or "amojis" or "audiomojis"