The only description of sip witch with some actual content (that I could find) is here: http://www.slideshare.net/gnutelephony/harvard2010 However it's missing some really important points and looks like technical issues are just skipped:
- How do users locate each other? (looks like user@domain is "known" somehow) GNU telephony blog mentions "peer-to-peer mesh calling networks" but I can't find any actual working prototype.
- Why wouldn't I just use Freeswitch which can support both nat traversal and ZRTP?
- How do they imagine the initial connection with both sides behind the firewall if no ports are mapped? (skype can do that due to known network of hosts)
If they don't have some great solutions here, I'm not sure why are they writing this from scratch instead of adding those "peer-to-peer mesh calling networks" capabilities as a standardised protocol to PBXes which are used nowadays (asterisk, freeswitch, yate).
FreeSWITCH is a great starting point, but NAT traversal isn't reliable for point to point, for the general case. You have to consider symmetric NAT without UPnP and firewalls that allow outbound+response, but no incoming. In these cases, you are going to need a proxy with a public IP that can relay audio. That's the only reliable thing that's going to work in all cases. (Just look at the IETF's documents on NAT traversal where they come up with all these complex ways to try to exploit limitations in certain NAT devices...)
Actual NAT traversal in SIP/RTP is very straightforward. You just ignore the silly parts in spec that specify IPs, and just reply to the IP:port combo you received from. For RTP, same deal. You know where you'll receive RTP, so as soon as you get a packet there, you just reply to the IP:port it came from.
Skype's "breakthrough" that let them win so much early on was that they used P2P to do public IP relays, since connectivity between any two points is not guaranteed. (Had MS been smart enough to proxy audio/video in their clients early on, they might have dominated this market to begin with...)
So, to overcome this with free software, they'd need a public IP relay system that protects things end-to-end. If they do that, it could be worth watching. But from that linked presentation, they expect Sipwitch to update the firewall rules in such access. Good luck with that.
- They can't use end-to-end encryption since public relay has to add information about the public address the message came from (unless it wants to transfer the media itself).
- They can't allow random changes to addresses by intermediate nodes, since that would allow trivial attack on the mesh infrastructure.
How will they stop a situation where someone creates lots of nodes, proxies SIP, but randomises the media addresses? Media address can't be encoded at the source, since it has to come from the relay. It wouldn't be hard for a competing company to spawn thousands of nodes on EC2 and overload the network with broken "relay" nodes.
What asterisk does is no different than MySQL and QT.