Video calls work on schildichat on android but I use straight element now and I'd have to check.
We are all aware that matrix needs a TURN/STUN server to do audio and video calls, right? I think users might be able to specify their own turn server, but if your homeserver can't punch NAT you're not making video calls, period.
I don't know of any non-centralized functional service offhand that can do peer to peer video or voice without an intermediary 100% of the time. That crap died in the 90s.