As someone who works on the code for a competitor of Sonos, the answer is that it is hard to do, depending on your requirements.
> ...(from) a video, or a game
So then you need low latency, like less than 10ms? So that lip-sync works, and the game is playable?
Do you need it distributed across different endpoints, also with low latency?
Does it need to run using unreliable WiFi connections, and not kill all audio just because one endpoint is under-performing?
These are all hard, hard enough that doing it well (and keeping it proprietary) makes companies like Sonos big.
OTOH, streaming mp3 from one endpoint to another is trivial.