Each jitsi browser instance tells webrtc to encode and send multiple spatial layers of the same video stream (at different qualities), allowing users (and the SFU) to automatically choose which one to receive, according to bandwidth and network congestion.
The result is a seamless experience that just works, even with 16+ people (we've been doing jitsi conferences nearly every working day since April at our dance school, it works like a charm and is easily customizable).
Other solutions also use custom additional WASM modules for echo cancelation and noise suppression in webrtc; and jitsi has an excellent dropdown UI with previews for choosing audio/video sources (and sinks!).