I once was a customer of an ISP that mistakenly blocked the whole 192.0.0.0/8 net, which caused some confusion, but they fixed it after I pointed it out.
BTW your assumption "a successful ICMP ping = TCP and UDP work" is an extremely common one that I too had before I was taught otherwise.
Still, some middlebox/stateful firewall/etc. messing with 169.0.0.0/8 is plausible.
Like...
"My car won't start."
"Oh, OK, have you tried waiting for the traffic lights to go green, as designed by the Principal Road Engineer?"
At my old uni, L1 were paid students, L2 were paid staff, and L3 were the actual netops/sysadmins so sometimes L2 would try to close something out that needed escalated.
In addition, they had resnet (residential network) and pronet (professional network) where the former was for student housing and the latter everything else. Resnet had more restrictions and traffic shaping such that pronet traffic was prioritized. In addition, resnet wireless had a different NAT setup whereas resnet wired used public IPs with inbound traffic blocked. This lead to all kinds of caveats like online gaming using uPnP only working on wireless despite wired having public IPs.
All that explanation is just ritual -- it does not need to make sense.
In the linked picture [0] I have packet #436 selected, its a retransmission of the handshake syn/ack with seq=0 ack=1, repeating a few times later, same as OP.
So as others suggested, likely a misconfigured BOGON rule with 169.0.0.0/8, but also matching outbound established connections rather than new/any state for some reason.
This is how you get NOCs to help you quickly, give them not only the problem but the root cause as well. Its not that they (or me) are lazy, its just that it can be so many things that can be a potential cause of problems, especially when you only have incomplete information to go on.
The problem really should be escalated and the nonsense answer pointed out, because if they care (and they should), they'll want to educate the person who gave that response.
We don’t pay enough
It's interesting that your side thinks the three-way handshake worked, but the remote side continues to resend the [SYN, ACK] packets, as if they've never received the final [ACK] from you.
Had a hellish time troubleshooting a similar problem several years ago with F5 load balancers - there was a bug in the hashing implementation used to assign TCP flows to different CPUs. If you hit this bug (parts per thousand), your connection would be assigned to a CPU with no record of that flow existing, so the connection would be alive, but would no longer pass packets. Would take a long time for the local TCP stack to go through its exponential retries and finally decide to drop the connection and start over .
We diagnosed the same(ish) bug in first generation F5 LBs in the 90s[1]. Figured exhaustive testing for this would have been SOP by now.
[1] To be fair, almost all 1st gen LBs had at least one major "send the packet to the wrong place and the state table gets screwed up" bug.
try to reduce MTU on client, 1280 is a good starting point.
I'd ask OP to check if this is only affects a subset of their IPs from https://bunnycdn.com/api/system/edgeserverlist, or whether all of their IPs are affected using `curl --resolve bunnycdn-hosted-website.com:80:some-other-ip http://bunnycdn-hosted-website.com`.
A few ideas to test this theory: 1) Find an asset on their server that is smaller than 500-1000 bytes so the entire payload will fit in a packet. Maybe a HEAD would work? 2) Clamp your MSS on this IP to something much smaller like 500 instead of the standard 1460. This should force the server to send smaller packets and will work better in practice than changing your MTU. See: https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.cookbook.mtu-...
The TLS Client hello is not that big (the client sent FIN is seq=518), and the server is only sending packets with SEQ=0. As others pointed out this likely means that the server that received the SYNs is not receiving the final ACK and data packets.
From what I can tell, the example IP is not broadly anycast. From my test hosts in Seattle, traceroute takes me trhough transit to San Jose, and then either
vl201.sjc-eq10-dist-1.cdn77.com or vl202.sjc-eq10-dist-1.cdn77.com and finally
169-150-221-147.bunnyinfra.net
I'm not sure how easy it is to run a traceroute with tcp with different flags. But if the OP can run a traceroute with only the SYN flag, and again with only the ACK flag, that might be pretty interesting. I suspect this is an issue inside BunnyCDN's network where packets from this user/network with SYN go to one server host, and with ACK go to another. Maybe there's an odd router somewhere that's routing these differently, but if they both make it to Bunny, they should both work.
With
$ traceroute --version
Modern traceroute for Linux, version 2.1.2
Copyright (c) 2016 Dmitry Butskoy, License: GPL v2 or any later
I can specify to do a traceroute with syn or ack with traceroute 169.150.221.147 -p 443 -q 1 -T -O ack
or traceroute 169.150.221.147 -p 443 -q 1 -T -O syn
Wrong answer about MTU below for posterity:Yeah, that would be my bet too. Especially with a after 60 seconds things start to work, I think that's the timeout for windows to do PMTU Blackhole probing (which is painfully slow; iOS and I think MacOS do it much sooner; I think even Android has gotten around to doing it in a reasonable amount of time)
I've got a test site up that might work for the OP http://pmtud.enslaves.us/
But, if it's really only happening with BunnyCDN, it's possible that most of their routes are 1500 MTU clean (or have working path MTU) and only the routes to get to BunnyCDN aren't. Of course, a lot of popular services intentionally drop their advertised MTU and allowed outbound MTU to work around the many broken networks out there, so service X and Y works doesn't really mean the path is clean.
I had seen this exact issue with Fastly a few years ago.
Maybe from the server's point of view the SYN and ACK are coming from distinct addresses and this is tripping them up ?
I have 2 internet connection in my home and would encounter some strange bugs whenever I used both connections at the same time. I never debbuged theses cases but they always disappeared when I just used 1 connection and left the second as a backup.
Second, I see that whatever client he's using is specifying a very old TLS 1.0. If its not MTU (which others have mentioned), then my guess would be a firewall with a policy specifying a minimum TLS version, and dropping this connection on the floor.
If a TLS handshake is aborted partway through, Wireshark will label it “TLSv1”. It actually retroactively labels the 1.0 TLS packets as 1.3 after a successful TLS 1.3 handshake finishes.
This makes sense because a TLSv1.3 handshake actually starts as 1.0 and then upgrades to 1.3 only with IIRC the Server Hello response to the ClientHello.
The following links document this behavior, in case you or your organization’s security team is nervous TLSv1 is actually being used:
https://superuser.com/a/1618420
https://ask.wireshark.org/question/24276/how-does-wireshark-...
Such redirection is often done on a specific port basis, so that trying to access different ports might produce a different result, such as a RST packet coming back from port 1234 with a different TTL than port 443.
There is so much cheating going with Internet routing that the TTL is usually the first thing I check, to make sure things are what they claim.
https://security.berkeley.edu/services/bsecure/bsecure-remot...
That is on the same level as e.g. the customer hotline at a phone company ("did you try turning it off and on again?"), I would have thought that Berkeley of all university has higher standards than that
It's indeed sad how more and more unis outsource all their IT. Like they've become too stupid to manage the tech they created. A friend of mine just told me how his old college is currently moving their email to Google, and are also looking to move all the web hosting somewhere else. What's next, have the whole network managed by Comcast? Pay per connected device?
[1] : http://www.growse.com/2020/01/23/adventures-with-asymmetric-...
Would suspect some of the other responses first though, but if they don't help this could be a possibility if they are using anycast.
It does feel like maybe a different server/network path getting the SYN+ACK vs the ACK, but probably in BunnyCDN's equipment --- but maybe something weird in Berkeley's (wired) network causes weird behavior for BunnyCDN? Hard to really know without pcaps from both ends, which are hard to get. Something funky in the load balancer seems like a good guess to me.
Some 10 years back I was working for a solar company doing SCADA stuff (monitoring remote power plant equipment, reporting generation metrics, handling grid interconnect stuff, etc).
We had a big room with lots of monitors that looked like a set in a Hollywood film, no doubt inspired by them. You could see all the solar installations all around the world that we monitored. The monitoring crew put out a call for engineers, stat, and as I walked into the monitoring room I could see perhaps 1/10th of the power plant icons on the wall we red "lost communication", one plant went green to red right in front of me.
This started a shitstorm with all hands be summoned. Long story short, somebody decided the best way to get an external IP for one of our remote gateways was to use curl command to a whatismyip.com type service, but instead of targeting Google (or you know, a server under our control), it hit some random ISP in Italy. The ISP most have eventually realized they were getting ping on by thousands of devices 24/7, so they decided they would drop some percentage of incoming requests silently, and of course the curl call was blocking without timeout. When the remote gateway's was dropped, it blocked indefinitely.
I skipped a lot in between but it was definitely a fun firefighting session, it was particularly hampered by a couple engineers that were quite high up on the food chain getting lead in the wrong direction (as to the root cause) at the beginning and fighting particularly hard against any opposing theories. It was the one time I basically got to drop the "I'm right and I bet my job on it." Fun times.
https://devnonsense.com/posts/asymmetric-routing-around-the-...