The difficulty is in having precise playback time. If you're a few dozen milliseconds off the music will be terrible.
Perhaps make it so every phone/computer device with spotify connected can control its one set of speakers either cabled or bluetooth. User selects "Group Speakers". It plays a fun tone where the app listens for the delay between the audio going out and the audio being received, finding the playback latency. Now that the devices know their playback latency they can communicate and play at the same time. As users do this with things like bluetooth speakers you collect that data and make it automatic for future speakers of that type. If it's out of sync just have a "sync" button to do it manually. People will figure it out.
Recently was looking for same feature (multi-room audio) and ended up buying IKEA SYMFONISK (basically sonos) speakers, that function without an issue. Only extra step is going to Sonos app to merge them and Spotify picks them up as one device , no questions asked. So technically it should be rather easy, no latency issues at all.
My Alexa devices are scattered about the house, so I'm not interested in stereo separation in a single room - maybe they are talking about the timing required for that.
It sucks that you have to control it with a mobile app (I did write a PC app using Tk but that is not much better than a mobile app!). Also I have had a lot of devices fail either completely or partially (WiFi just went on my 7.1 receiver the other day, the same receiver is relegated to a (awesome) stereo because the HDMI port). The 5.1 receiver was a decontented ‘pandemic special’ with no HEOS.
But that said, the whole home audio is great and works with my jellyfin and Amazon Music and TuneIn and presumably spotify.
Works perfectly on pis scattered around the house.