Maybe it's a comprehension issue on my end, but he seems to associate things like stun and dtls as related, compounding issues (particularly in round trip time), but they are really orthogonal.
Also, he spends too much time talking about how you can't resend packets, and reiterates that point by stating they tried really hard (at discord?). That's where he lost the plot, imo.
The RTC in WebRTC is about real time communication. Humans will naturally prefer the auditory experience of an occasional dropped packet, vs backed up audio or audio that plays at an uneven rate. To clarify, I'm talking about human speech here.
If you want to tolerate packet loss, use a protocol based on tcp instead of udp. But you know what happens when you send audio over poor network conditions with tcp? There will be pauses on the receiving end as it waits for the next correct packet. Let's say the delay is multiple seconds. What should the receiving end do when packets start flowing again? Plays the clogged audio at a natural clock? Attempt to play the audio back at a higher rate to "catch up" with any other channels? People, humans, do not generally prefer that experience.
Forget about WebRTC for a minute, but instead think about tcp vs udp for voice. Voip has been based on udp since the 90's for a reason.