The difficulty is in having precise playback time. If you're a few dozen milliseconds off the music will be terrible.
Perhaps make it so every phone/computer device with spotify connected can control its one set of speakers either cabled or bluetooth. User selects "Group Speakers". It plays a fun tone where the app listens for the delay between the audio going out and the audio being received, finding the playback latency. Now that the devices know their playback latency they can communicate and play at the same time. As users do this with things like bluetooth speakers you collect that data and make it automatic for future speakers of that type. If it's out of sync just have a "sync" button to do it manually. People will figure it out.
Recently was looking for same feature (multi-room audio) and ended up buying IKEA SYMFONISK (basically sonos) speakers, that function without an issue. Only extra step is going to Sonos app to merge them and Spotify picks them up as one device , no questions asked. So technically it should be rather easy, no latency issues at all.
My Alexa devices are scattered about the house, so I'm not interested in stereo separation in a single room - maybe they are talking about the timing required for that.
Works perfectly on pis scattered around the house.