Punching through NAT, and most associated state tracking filters, is very easy.
I've implemented such in a production corp environment, as a product to be sold. There is no magic here, it is all well understood technology by the practitioners.
If you actually want to have packet filtering (a firewall) then deploy a firewall instance distinct from any NAT, and with appropriate rules. However that only really helps for traffic volume reduction, the actual security gain from a f/w per se is now minimal, as most attacks are over the top: HTTP/HTTPS, POP/IMAP etc.
You can say that in general, network firewalls are not a security mechanism. They are at most a means to prevent brute-force attacks from outside of the network.
In reality, of course, the stateful firewall is doing all of the heavy lifting that NAT is getting the credit for. Tailscale does not get rid of the firewall in fact it has a much more comprehensive setup based on proper ACLs.
Though I’m definitely the first to admit that their tooling around ACL’s could be significantly improved
Networking has long been the toxic wasteland of security and misconfiguration. Now combine that with newer host-based networking models for containers. The Windows network stack is substantially different now due to that, and more complex. Since Wireguard has been part of Linux, everyone and their brother now has a VPN, somewhere connecting to a VPS. It's probably worse than you think because you don't know what you don't know.
Firewalling is a different concept, but since you raise that issue of connectivity wrt. security, I have to say that what makes /me/ sad and anxious is to see how internet security has always been hinging on bloquing paquets based on the destination port.
Doing what's easy rather than what's correct, exemplified and labelled "professional solutions"...
It still needs something on the inside to talk to outside first, so the actual firewall should whitelist both outbound and inbound connections.
Than again, if you rely on perimeter, it’s a matter of time when someone figures out what’s your equivalent of high wiz jacket is.
It's also worth considering that exploitability of ACL code is just one factor in comparing the risk and Tailscale or similar solutions allow security conscious setups that are not possible (or at least much more difficult) otherwise. For example, the NAT and firewall traversal means you don't have to open any ports anywhere to offer a service within your Tailscale network. Done correctly, this means very little attack surface for a bad actor to gain access to that stray VM in the first place. You can also implement fairly complex ACL behavior that's effectively done on each endpoint without having to trust your network infrastructure at all, behavior that stays the same even if your laptop or other devices roam from network to network.
Not to say I believe Tailsclae is bulletproof or anything, but it does offer some interesting tradeoffs and it's not immediately obvious to me the risk is worse than legacy networks (arguably better), and you gain a lot of interesting features and convenience.
Defense in depth.
How do you suppose they gained access to the kernel and userspace just by having a network connection to the laptop?
"At a less granular level, the coordination server (key drop box) protects nodes by giving each node the public keys of only the nodes that are supposed to connect to it. Other Internet computers are unable to even request a connection, because without the right public key in the list, their encrypted packets cannot be decoded. It’s like the unauthorized machines don’t even exist. This is a very powerful protection model; it prevents virtually any kind of protocol-level attack. As a result, Tailscale is especially good at protecting legacy, non-web based services that are no longer maintained or receiving updates."
Source: https://tailscale.com/blog/how-tailscale-works#bonus-acls-an...
The tribal knowledge seems to be that you shouldn't do TCP-based hole punching because it's harder than UDP. The author acknowledges this:
> You can do NAT traversal with TCP, but it adds another layer of complexity to an already quite complex problem, and may even require kernel customizations depending on how deep you want to go.
However, I only see marginally added complexity (given the already complex UDP flows). IMO this complexity doesn't justify discarding TCP hole punching altogether. In the article you could replace raw UDP packets to initiate a connection with TCP SYN packets plus support for "simultaneous open" [0].
This is especially true if networks block UDP traffic which is also acknowledged:
> For example, we’ve observed that the UC Berkeley guest Wi-Fi blocks all outbound UDP except for DNS traffic.
My point is that many articles gloss over TCP hole punching with the excuse of being harder than UDP while I would argue that it's almost equally feasible with marginal added complexity.
[0] https://ttcplinux.sourceforge.net/documents/one/tcpstate/tcp...
Hence the added complexity of doing a simultaneous open via TCP is fairly minor. The main complication is communicating the public mapping, and coordinating the "simultaneous" punch/open. However that is generally needed for UDP anyway...
One possible added complexity with TCP is one has to perform real connect() calls, rather than fake up the TCP SYN packet. That is becase some firewalls pay attention to the sequence numbers.
Also, wouldn't it be easier for stateful firewalls to block simultaneous TCP open (intentionally or not)? With UDP, the sender's firewall must create a connection as soon as it sends off the first packet, even if that packet bounces off the other firewall: the timing doesn't have to be particularly tight. But with TCP, the firewall might plausibly wait until the handshake is complete before allowing incoming packets, and it might only allow the 3-way SYN/SYN-ACK/ACK instead of the simultaneous SYN/SYN/ACK/ACK.
TCP hole punching is very fun. The way I do it is to use multiple NTP readings to compute a "clock skew" -- how far off the system clock is from NTP. Then the initiator sets a future meeting time that is relative to NTP. It honestly gets quite accurate. It even works for TCP hole punching between sockets on the same interface which is crazy if you think about it.
The reason I wanted to support this strange, local-based punching mode is if it works that efficiently to be able to succeed in host-based punching then likely it will be fast enough to work on the LAN and Internet, too. My code is Python and my very first attempt at this was eye opening to say the least. Due to how timing-sensitive TCP hole punching is I was having failures from using Python with old-school self-managed sockets. I was using threading and a poormans event loop (based on my C socket experience)... which is ah... just not the way to do it in Python.
The only way I could get that code to work was to ensure the Python process had a high priority so other processes on the system didn't deprioritize it and introduce lag between the punching attempts. That is how time-critical the code is (with an inefficient implementation.) My current implementation now uses a process pool that each has its own event loop to manage punching. I create a list of tasks that are distributed over time. Each task simply opens a connection that is reused from the same socket. I determined this code was the best approach (in Python anyway) after testing it on every major OS.
You are right about TCP and UDP hole punching difficulty being similar. The main difficulty to both is the NAT prediction step. I haven't written code yet for symmetric NAT bypass but I am starting to see how I'd integrate it (or possibly write a new plugin for it.)
This was sometimes an issue for underpowered home/SOHO routers in the mid-2000s, but most modern routers have enough memory to support decently sized connection-tracking tables.
In any case, both TCP and UDP require connection tracking; there's no inherent advantage to UDP.
GRE tunnels exist and I actually use them extensively, but UDP hole punching is not handled so hub-and-spoke architecture is needed for them, no peer to peer meshes with GRE (ip fou).
Are there equivalent libraries out there which do UDP hole punching and unencrypted GRE tunnels following an encrypted handshake to confirm identity?
libp2p [4] may be what you're after if you want something geared more towards general purpose connectivity.
[1] https://datatracker.ietf.org/doc/html/rfc8445
[2] https://github.com/pion/webrtc
FWIW, libp2p also enforces transport encryption, quote:
> Encryption is an important part of communicating on the libp2p network. Every connection must be encrypted to help ensure security for everyone. As such, Connection Encryption (Crypto) is a required component of libp2p.
It's written in Python. Though its not based on using the default interface like most networking code. I wanted the possibility to be able to run services across whatever interfaces you like. Allowing for much more diverse and useful things to be built. Its mostly based on standard library modules. I hate C extension crap as it always breaks packages cross-platform.
This is where I wish SIP lived up to its name (Session Initiation Protocol, i.e. any session, such as a VPN one...) and wasn't such a complicated mess making it not worth the hassle. I mean it was made to be the communication side-channel used for establishing p2p rtp streams.
Its like http, but its also statefull, bidirectional, federated and works over udp too.
Just looking at the amount of stuff (tls over udp included) baresip implements to barely sip. And it isnt even bloated, the stuff has to be there.
and for the same reason: both were initialy designed to be simple...
Previous discussion:
(2022) https://news.ycombinator.com/item?id=30707711
(2020) https://news.ycombinator.com/item?id=24241105How NAT traversal works (2020) - https://news.ycombinator.com/item?id=36969018 - Aug 2023 (106 comments)
How NAT traversal works (2020) - https://news.ycombinator.com/item?id=30707711 - March 2022 (37 comments)
How NAT Traversal Works - https://news.ycombinator.com/item?id=24241105 - Aug 2020 (28 comments)
(p.s. your links weren't clickable because lines that are indented with 2 or more spaces get formatted as code - see https://news.ycombinator.com/formatdoc)
A read a bit about this space a few weeks ago after not knowing anything about it beforehand. My impression is that ip6 dices all of this and NAT traversal isn't necessary anymore. So why isn't ip6 more popular and how do I get started with it for my home network and tailscale VPN?
Not sure how much of a factor but human usability has always been challenging.
There is also the lack of business incentives.
This may be the only way we ever have to build p2p apps. IPv6 doesn't have enough steam since NAT and SNI routing solve most problems for most people.
And ISPs are very much not incentivized for that to change.
It's a very good theoretical article. I wonder to what extent a software engineer could use this though. Because although it does describe many things I'm not sure there's enough detail to write algorithms for it. Like, could an engineer wrote an algorithm to test for different types of NATs on the basis of this article? Could they adapt their own hole punching code? I've personally read papers where simple tables were more useful than entire articles like this (as extensive as it is.) Maybe still a good starting point though.
Also, the last section in the article is extremely relevant. It has the potential to bypass symmetric NATs which are used in the mobile system. The latest research on NAT traversal uses similar techniques and claims near 100% success rates.