A macOS bug that causes TCP networking to stop working after 49.7 days (opens in new tab)

(photon.codes)

171 pointsRyanZhuuuu2mo ago114 comments

114 comments

82 comments · 32 top-level

BitsAndObjects2mo ago· 8 in thread

I got tired of the AI writing before finding out if they even attempted to contact Apple about this issue? Does anyone know?

Also, massively over-dramatised. Yes, a bug worth finding and knowing about, but it’s not a time bomb - very few users are likely to be affected by this.

Knowing the nature of OS kernels, I’m guessing even just putting a Mac laptop to sleep would be enough to avoid this issue as it would reset the TCP stack - which may be why some people are reporting much longer uptimes without hitting this problem, since (iirc) uptime doesn’t reset on Macs just for a sleep? Only for a full reboot?

Anyway, all in all, yeah hopefully Apple fix this but it’s not something anyone needs to panic about.

bigiain2mo ago

> very few users are likely to be affected by this

I have a reasonably strong suspicion that I experienced this a week or two back, on a MacBook that doesn't go into sleep automatically and quite likely had 50-ish days of uptime.

It had all the symptoms described - tcp connections not working while I could still ping everywhere just fine, and all the other devices on the same network were fine. Switching WiFi networks and plugging in to ethernet didn't help. A reboot "fixed" it.

castillar762mo ago

Yep, I concur: this explains a bizarre behavior I’ve noted in my Mac laptops for ages now. I have a tendency to just suspend them without rebooting for ages, especially the work one that doesn’t leave my office as frequently. Periodically, I’d come in to find the system bizarrely frozen just as they describe: TCP stack blocked up, but everything else on it behaving normally. (Well, mostly: some apps would block starting and bounce eternally, but I suspect that’s because they’re trying to make a network call while starting up and it’s blocking.) The only fix was a reboot.

It’s not a disaster, but very annoying. At least now I can just schedule a reboot every 30 days at minimum to keep things running.

2 more replies

BitsAndObjects2mo ago

I would not be surprised if people on HN were more likely to hit this issue than Apple's average users. We're a weird bunch ;)

1 more reply

delusional2mo ago

Apparently no. They'll be fixing it themselves? It really reads like Claude run amok on the blog.

> We are actively working on a fix that is better than rebooting — a targeted workaround that addresses the frozen tcp_now without requiring a full system restart. Until then, schedule your reboots before the clock runs out.

theshrike792mo ago

I think I might've hit my head on this a few times with my Mac Mini that's on basically 24/7 and doesn't go to sleep.

Sometimes it just stops networking completely, turning the wifi adapter on/off brings it back just fine. It's also a good time to reboot =)

RyanZhuuuuOP2mo ago

yes we have reported to Apple and they have filed it in their internal system.

otterley2mo ago

Did you need to make this blog post 20 pages long and have AI write it? Especially in such dramatic style?

Remember the golden rule: if you can't be bothered to write it yourself, why should your audience be bothered to read it ourselves?

Aloisius2mo ago

Might want to update it if you used the blog post explanation because it's incorrect as justinfrankel noted below. From the post:

    tcp_now   = 4,294,960,000  (frozen at pre-overflow value)

The mistake in the blog post is timer isn't wrapped, even though it notes it should be:

    timer     = 4,294,960,000 + 30,000 = 4,294,990,000 - MAX_INT = 22,704

Therefore:

    TSTMP_GEQ(4294960000, 22704)
    = 4294960000 - 22704
    = 4294937296
    = 4294937296 >= 0 ?  → true! (not false)

This is a bug of course, but it would cause sockets in TCP_WAIT state to be reaped anytime tcp_gc() is called, regardless of whether 2*MSL has passed or not. This only happens though if tcp_now gets stuck after 4,294,937,296 ms from boot.

A bug similar to what the blog described can happen however if tcp_now gets stuck at least 30 seconds before it it would have wrapped. Since tcp_now is only updated if there is TCP traffic, this can happen if there is no TCP traffic for at least 30 seconds before before it would roll over (MAX_INT ms from boot).

It's should be easy to prevent the latter from happening with some TCP traffic, though reaping TCP_WAIT connections early isn't great either.

2 more replies

loloquwowndueo2mo ago· 7 in thread

lol reminds me of the windows 95 crash bug after 49.7 days. Have we learned nothing. https://pipiscrew.github.io/posts/why-window/

aranelsurion2mo ago

I was just trying to remember where did I last see this magic number of days.

loloquwowndueo2mo ago

The article does mention a few instances found over the years, including the windows one. That’s the one I remember though because we used to joke it was not a big deal - the only way for a windows 95 computer to reach 49 days of uptime is if it’s literally not doing anything or being used in any way. Windows 95 would crash if you looked at it funny.

2 more replies

larodi2mo ago

49-7=42 it is all clear

ok1234562mo ago

Quite literally "the new old thing."

auspiv2mo ago

probably same thing for boeing 787 jets - https://www.theregister.com/2020/04/02/boeing_787_power_cycl...

says 51 days, which would be an interesting number of (milli)seconds

otherme1232mo ago

It could be an overflow but related with the frequency at which the register was increasing, rather than the max value of te register. E.g. +1 this uint16 (65535) once every 500,000 cycles on this 32 Mhz chip, that previously was a 1 Mhz chip and never had a problem.

znpy2mo ago

that's why the 49.7 days sounded familiar!

mcculley2mo ago· 6 in thread

> It will not be caught in development testing — who runs a test for 50 days?

You don't have to run the system for 50 days. You can simulate the environment and tick the clock faster. Many high reliability systems are tested this way.

dezgeg2mo ago

IIRC the initial value for the jiffies time counter in Linux kernel is initialized at boot time to something like five minutes before the wraparound point, precisely to catch this kind of issues.

bobmcnamara2mo ago

WinCE too

hombre_fatal2mo ago

It uses a hardware clock, one that pauses during sleep. There is no tick.

If you wanted to see how time impacts the program, you'd prob change fns like calculate_tcp_clock to take uptime as an argument so that you could sanity check it.

mcculley2mo ago

Yes. I do mean designing software to make it testable.

The code that uses that value can be run in an environment where that value can be controlled.

I have written code that does this same thing and built a test harness for it.

adamtulinius2mo ago

We're talking about a company that produces the hardware their OS is running on. I'm sure they can find a way to make the hardware clock run faster.

1 more reply

sho_hn2mo ago

Heck, many video games are tested this way.

justinfrankel2mo ago· 6 in thread

have multiple macOS machines with 600-1000+ day uptimes, which do TCP connections every minute or so at a minimum, they are still expiring their TIME_WAIT connections as normal.

these kernel versions:

Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_ARM64_T8101 arm64

Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-4570.71.45~1/RELEASE_X86_64 x86_64

so... wonder what that's about?

justinfrankel2mo ago

ah reading their analysis, there are errors that explain this. Particularly this:

  tcp_now   = 4,294,960,000  (frozen at pre-overflow value)
  timer     = 4,294,960,000 + 30,000 = 4,294,990,000
              (exceeds uint32 max → wraps to a small number)

timer wraps to a small number, they say

  TSTMP_GEQ(4294960000, 4294990000)

they forgot to wrap it there, it should be TSTMP_GEQ(4294960000, small_number)

  = (int)(4294960000 - 4294990000)
  = (int)(-30000)
  = -30000 >= 0 ?  → false!

wrong!

There may be a short time period where this bug occurs, and if you get enough TCP connections to TIME_WAIT in that period, they could stick around, maybe. But I think the original post is completely overreacting and was probably written by a LLM, lol.

Aloisius2mo ago

There does appear to be a bug, but it's not what the blog describes.

If tcp_now stops updating at <= 2^32 - 30000 milliseconds, then TSTMP_GEQ(tcp_now, timer) will always fail since timer is tcp_now + 30000 which won't wrap.

This does look like it is possible since calculate_tcp_clock() which updates tcp_now only runs when there's TCP traffic. So if at 49 days uptime you halted all TCP traffic and waited about a day, tcp_now would be stuck at the value before you halted TCP traffic.

In cases where tcp_now gets stuck at > 2^32 - 30000, it looks like TCP sockets in the TIME_WAIT will end up being closed immediately instead of waiting 30 seconds, which isn't great either.

2 more replies

mhjkl2mo ago

They didn’t need to wrap it because it’s modular arithmetic so the result after casting to int is the same regardless of wrapping behavior. 4294990000 after wrapping is 22704 and 4294960000 - 22704 = 4294937296 which is -30000 after uint to int cast

comex2mo ago

The bug was introduced only last year in macOS 26:

https://github.com/apple-oss-distributions/xnu/blame/f6217f8...

plorkyeran2mo ago

> Apple Community #250867747: macOS Catalina — "New TCP connections can not establish." New connections enter SYN_SENT then immediately close. Existing connections unaffected. Only a reboot fixes it.

This is a weird thing to cite if it's a macOS 26 bug. I quite regularly go over 50 days of uptime without issues so it makes sense for it to be a new bug, and maybe they had different bugs in the past with similar symptoms.

Aloisius2mo ago

Interesting. The article mentions complaints on the forums running Catalina, so that must be something else.

2 more replies

otterley2mo ago· 5 in thread

Sounds like it affects every open TCP connection, not just OpenClaw. (It's pretty rare for a TCP connection to live that long, though.)

josephcsible2mo ago

Individual TCP connections don't need to live that long. Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.

throw0101d2mo ago

> Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.

Current `uptime` on my work MacBook (macOS 15.7.4):

    17:14  up 50 days, 22 mins, 16 users, load averages: 2.06 1.95 1.94

Am I supposed to be having issues with TCP connections right now? (I'm not.)

My personal iMac is at 279 days of uptime.

5 more replies

CamperBob22mo ago

Sure they do. They need to live until torn down.

They almost never do live that long, for whatever reason, but they should.

1 more reply

gpvos2mo ago

Obviously, OpenClaw is now more important than anything else.

mememememememo2mo ago

For OpenClaw this bug is a security feature

tjohns2mo ago· 4 in thread

Does anybody else find these AI-authored blog posts difficult to read? Something about the writing style and structure just feels unnatural, it's hard put my finger on it.

At the very least, the writing takes way too long to get to a point.

dawnerd2mo ago

Same, AI written anything is really difficult for me to read and pretty exhausting.

gowld2mo ago

AI does a good job of condensing the blog post to 2 paragraphs -- Mac refuses to let the tcp_now clock rollover when it exceeds the max value in its data type.

nslsm2mo ago

Use AI to expand your thoughts into a long-winded post, use AI to compress the long-winded post into something that can be digested by a human.

2 more replies

coldtea2mo ago

Can it summarize it down to a non-post?

1 more reply

MatMercer2mo ago· 4 in thread

This made me remember some folks that are "I never reboot my MacOS and it's fine!". Yeah probably it is but I'll never trust any computer without periodic reboots lol.

QuantumNomad_2mo ago

I’m still at where when I connect external hard drive or SSD via USB, use it and then eject it, I shut down the MacBook Pro completely before I unplug the USB cable. Just in case.

The longest uptime I have had on any of my recent laptops is probably around 90 days but that’s because that laptop was sitting in my garage with wall power connected (probably bad for the battery) and some external storage connected and I’d remote into that machine over WireGuard now and then. When I did reboot that machine it was only out of habit that I accidentally clicked on reboot via a remote graphical session.

Most of the time my remote use of the laptop in the garage would be ssh sessions, but occasionally I’d use Remote Desktop. Right after I clicked reboot in the Remote Desktop session I realized what mistake I had just done - I have WireGuard set up to start after login. So after the reboot, I was temporarily unable to get back in. As I was in another country I couldn’t just walk over to the garage. But I do have family that could, so I instructed one of them over the phone on how to log in for me so that WireGuard would automatically start back up. You’d think this would happen only once, but I probably had to send family to the garage on my behalf maybe three or four times after me having made the same mistake again.

For the laptops that I actually carry around and plug and unplug things to etc, normal amount of time between reboots for me is somewhere between every 1 and 3 days. Cold boot is plenty fast anyway, so shutting it down after a day of work or when ejecting an external HDD or SSD doesn’t really cost me any noticeable amount of time.

Delk2mo ago

> I’m still at where when I connect external hard drive or SSD via USB, use it and then eject it, I shut down the MacBook Pro completely before I unplug the cable. Just in case.

That sounds... a bit paranoid? At least on Linux (Gnome), if I click to "safely remove drive" it actually powers off the drive and stops external mechanical drives from spinning. No useful syncing is going to happen anyway once a hard drive no longer spins. A modern OS should definitely be reliable enough that it can be trusted to properly unmount a drive.

> For the laptops that I actually carry around and plug and unplug things to etc, normal amount of time between reboots for me is somewhere between every 1 and 3 days. Cold boot is plenty fast anyway, so shutting it down after a day of work or when ejecting an external HDD or SSD doesn’t really cost me any noticeable amount of time.

I personally don't reboot my laptop that often, but it's not because of a boot taking too much time. It's because I like to keep state: open applications, open files, terminal emulator sessions, windows on particular virtual desktops, etc.

1 more reply

exe342mo ago

$ uptime

22:22:45 up 3748 days 21:20, 2 users, load average: 1.42, 1.36, 1.02

It's very funny, I think it's because my laptop battery died and when I replaced it, it had to update the time from 10 years ago? I'm not sure why, as the laptop is from mid-2012.

jasonjayr2mo ago

> 17:27:20 up 1112 days, 10:36, 50 users, load average: 0.20, 0.19, 0.18

I thought I had a record going here with my Dell laptop, but I guess you win. After a certain point, I just decided to see how long I can make it go.

1 more reply

nottorp2mo ago· 2 in thread

Hmm?

torp@machinename ~ % uptime 11:43 up 59 days, 1:22, 4 users, load averages: 2.87 2.69 2.70

Sleep is disabled on that machine and it definitely had networking working fine last night.

Mac Mini M2, Sequoia.

Incidentally my laptop says 75 days uptime, but that one does go to sleep.

cthalupa2mo ago

> Mac Mini M2, Sequoia.

It's Tahoe specific

https://news.ycombinator.com/item?id=47670995

nottorp2mo ago

So besides ruining the UI they fucked up the kernel too?

WesolyKubeczek2mo ago· 2 in thread

In case of OpenClaw, this is a feature.

4fterd4rk2mo ago

When some Russians do a prompt injection and OpenClaw is threatening to send your NSFW pics to Grandma unless you give it some Bitcoin all you have to do is drag out the negotiations for 49 days!

WesolyKubeczek2mo ago

I’d be afraid the Grandma would send some of her own NSFW pictures right back.

jijji2mo ago· 2 in thread

I thought Alan Cox fixed all the TCP IP bugs in the early 1990s lol

toast02mo ago

Did Alan Cox work on tcp? I thought he was working on memory and stuff.

That's what the wiki says anyway: [1], and a publication with his name is about huge pages [2]

[1] https://wiki.freebsd.org/AlanCox

[2] https://www.usenix.org/legacy/events/osdi02/tech/full_papers...

jijji2mo ago

Alan Cox of course worked on the TCP/IP stack:

"His involvement with Linux began in the early 1990s when he was working on a project that required a stable networking solution. This led him to discover Linux, which was still in its infancy at the time.

Contributions to Linux Kernel

Cox's contributions to the Linux kernel are extensive and far-reaching. He is best known for his work on the Linux networking stack, which was critical in making Linux a viable option for server environments. Cox identified and addressed numerous issues in the kernel's TCP/IP implementation, enhancing its performance and reliability." [0]

"For those not familiar with the Linux kernel contributors, Alan Cox wrote large parts of the networking stack, was the maintainer of the 2.2 branch, and was commonly considered the "second in command" to Linus Torvalds at one point: http://en.wikipedia.org/wiki/Alan_Cox" [1]

"Alan started working on Version 0. There were bugs and problems he could correct. He put Linux on a machine in the Swansea University computer network, which revealed many problems in networking which he sorted out; later he rewrote the networking software. [2]

[0] https://machaddr.substack.com/p/kernel-chronicles-insights-a...

[1] https://news.ycombinator.com/item?id=8548738

[2] https://web.archive.org/web/20200923003028/https://www.swans...

1 more reply

netcoyote2mo ago· 1 in thread

This type of problem plagues all sorts of software. Having experienced this type of problem before, for Guild Wars game servers -- which run deterministic game instances that live for long periods of time -- we initialized a per-game-context variable that gets added to Windows GetTickCount() to a value such that the result was either 5 seconds before 0x7fff_ffff ticks, or 5 seconds before 0xffff_ffff ticks, so that any weird time-computation overflow errors would be likely to show up immediately.

toast02mo ago

Yep, everything that relies on overflow needs to overflow soon after start, so that it's well tested.

JensRantil2mo ago· 1 in thread

This reminds me of the Linux kernel scheduler bug that kicked in after 208 days: https://www.claudiokuenzler.com/blog/247/linux-virtual-serve...

bigiain2mo ago

And Boeing 787s

https://airguide.info/boeing-787s-must-be-turned-off-every-5...

fortran772mo ago· 1 in thread

Nobody keeps their Macs running for more than 49.7 days? We have Windows Servers here (with long-term TCP/IP connections) that are only rebooted every 6 months to apply patches.

binaryturtle2mo ago

Macs that no longer get reboot-requiring updates by Apple usually have long(er) uptimes. :) My record here with my primary Mac mini was a bit over a year. Only to be forced to reboot because of a power outage.

Generally it feels like sometimes you boot into a stable "session" that can go on forever, but often enough you boot in a "session" and something goes wrong quickly and you'll have to reboot after a week or two. But I do experience the same with my Raspberry PI. :)

bawolff2mo ago· 1 in thread

Wasn't windows 95 famous for having an issue like this?

guywithahat2mo ago

Arduino too; I assume they all have to do with storing milliseconds in a uint32_t, and then getting unpredictable behavior when it rolls over

gghootch2mo ago

What does this have to do with OpenClaw exactly?

beanjuiceII2mo ago

i'm on sequoia M1 laptop with uptime 16:38 up 228 days, 21:03, 1 user, load averages: 6.14 5.93 5.64

guess i'm marked safe!

AndroTux2mo ago

Interesting. I think I can confirm this. Got a Tahoe system with 55 days uptime that's mostly idling:

% netstat -an | grep TIME_WAIT | wc -l

850

All other systems with < 49.7 days uptime report low single to double digit numbers.

ingmarstein2mo ago

Thank you for this post! I think I ran into this when running UniFi OS Server (which uses podman) on macOS 26: https://community.ui.com/questions/TCP-connection-leak/2ab61...

poppafuze2mo ago

https://news.ycombinator.com/item?id=41939318

cthalupa2mo ago

I'm pretty certain I've run into this a couple of times now since upgrading to Tahoe last year and had been wondering what the deal was. Had never thought to check the uptime and make note of it, but I basically never shut down my laptop.

dvh2mo ago

Exactly like arduino

apple4ever2mo ago

OH this explains why randomly my iMac would REFUSE to do any connections to anything. I never put together that it was because of uptime!

daveorzach2mo ago

If you want to see exactly when your machine will hit this, I threw together a fish shell function that calculates the precise timestamp, mostly vibe coded.

calc_tcp_overflow_time.fish: https://gist.github.com/daveorzach/64538f82a89fa24e5d134557c...

monitor_tcp_time_wait.fish: https://gist.github.com/daveorzach/0964a7a67c08c50043ff707cf...

nalekberov2mo ago

I rarely restart my Mac mini, and I have never had such an issue beyond my internet provider suddenly stopping properly working in the middle of the night.

NautilusWave2mo ago

How old is this bug? I can't imagine it exists on iOS or iPadOS; have those kernels really drifted that far apart though?

Philpax2mo ago

Ctrl+F "OpenClaw". No results. Que?

apatheticonion2mo ago

Ignoring the AI article contents.

God I wish Apple offered first party support for Linux on Mac computers.

throw031720192mo ago

I only have 11 days left until my machine crashes and I lose all of my tabs.

RyanZhuuuuOP2mo ago

quick update: the problem has been confirmed and resolved in the latest macOS 26.4 release (from Apple)

cute_boi2mo ago

too much words and text for simple thing..... probably written by openclaw

revv002mo ago

Orz! A kindly reminder for rebooting.

awithrow2mo ago

A ticking time bomb? What an overly dramatic way to talk about a bug that requires a reboot. Its not even a hard crash.

j / k navigate · click thread line to collapse

114 comments

82 comments · 32 top-level

BitsAndObjects2mo ago· 8 in thread

I got tired of the AI writing before finding out if they even attempted to contact Apple about this issue? Does anyone know?

Also, massively over-dramatised. Yes, a bug worth finding and knowing about, but it’s not a time bomb - very few users are likely to be affected by this.

Anyway, all in all, yeah hopefully Apple fix this but it’s not something anyone needs to panic about.

bigiain2mo ago

> very few users are likely to be affected by this

I have a reasonably strong suspicion that I experienced this a week or two back, on a MacBook that doesn't go into sleep automatically and quite likely had 50-ish days of uptime.

castillar762mo ago

It’s not a disaster, but very annoying. At least now I can just schedule a reboot every 30 days at minimum to keep things running.

2 more replies

BitsAndObjects2mo ago

I would not be surprised if people on HN were more likely to hit this issue than Apple's average users. We're a weird bunch ;)

1 more reply

delusional2mo ago

Apparently no. They'll be fixing it themselves? It really reads like Claude run amok on the blog.

theshrike792mo ago

I think I might've hit my head on this a few times with my Mac Mini that's on basically 24/7 and doesn't go to sleep.

Sometimes it just stops networking completely, turning the wifi adapter on/off brings it back just fine. It's also a good time to reboot =)

RyanZhuuuuOP2mo ago

yes we have reported to Apple and they have filed it in their internal system.

otterley2mo ago

Did you need to make this blog post 20 pages long and have AI write it? Especially in such dramatic style?

Remember the golden rule: if you can't be bothered to write it yourself, why should your audience be bothered to read it ourselves?

Aloisius2mo ago

Might want to update it if you used the blog post explanation because it's incorrect as justinfrankel noted below. From the post:

    tcp_now   = 4,294,960,000  (frozen at pre-overflow value)

The mistake in the blog post is timer isn't wrapped, even though it notes it should be:

    timer     = 4,294,960,000 + 30,000 = 4,294,990,000 - MAX_INT = 22,704

Therefore:

    TSTMP_GEQ(4294960000, 22704)
    = 4294960000 - 22704
    = 4294937296
    = 4294937296 >= 0 ?  → true! (not false)

It's should be easy to prevent the latter from happening with some TCP traffic, though reaping TCP_WAIT connections early isn't great either.

2 more replies

loloquwowndueo2mo ago· 7 in thread

lol reminds me of the windows 95 crash bug after 49.7 days. Have we learned nothing. https://pipiscrew.github.io/posts/why-window/

aranelsurion2mo ago

I was just trying to remember where did I last see this magic number of days.

loloquwowndueo2mo ago

2 more replies

larodi2mo ago

49-7=42 it is all clear

ok1234562mo ago

Quite literally "the new old thing."

auspiv2mo ago

probably same thing for boeing 787 jets - https://www.theregister.com/2020/04/02/boeing_787_power_cycl...

says 51 days, which would be an interesting number of (milli)seconds

otherme1232mo ago

znpy2mo ago

that's why the 49.7 days sounded familiar!

mcculley2mo ago· 6 in thread

> It will not be caught in development testing — who runs a test for 50 days?

You don't have to run the system for 50 days. You can simulate the environment and tick the clock faster. Many high reliability systems are tested this way.

dezgeg2mo ago

IIRC the initial value for the jiffies time counter in Linux kernel is initialized at boot time to something like five minutes before the wraparound point, precisely to catch this kind of issues.

bobmcnamara2mo ago

WinCE too

hombre_fatal2mo ago

It uses a hardware clock, one that pauses during sleep. There is no tick.

If you wanted to see how time impacts the program, you'd prob change fns like calculate_tcp_clock to take uptime as an argument so that you could sanity check it.

mcculley2mo ago

Yes. I do mean designing software to make it testable.

The code that uses that value can be run in an environment where that value can be controlled.

I have written code that does this same thing and built a test harness for it.

adamtulinius2mo ago

We're talking about a company that produces the hardware their OS is running on. I'm sure they can find a way to make the hardware clock run faster.

1 more reply

sho_hn2mo ago

Heck, many video games are tested this way.

justinfrankel2mo ago· 6 in thread

have multiple macOS machines with 600-1000+ day uptimes, which do TCP connections every minute or so at a minimum, they are still expiring their TIME_WAIT connections as normal.

these kernel versions:

Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_ARM64_T8101 arm64

Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-4570.71.45~1/RELEASE_X86_64 x86_64

so... wonder what that's about?

justinfrankel2mo ago

ah reading their analysis, there are errors that explain this. Particularly this:

  tcp_now   = 4,294,960,000  (frozen at pre-overflow value)
  timer     = 4,294,960,000 + 30,000 = 4,294,990,000
              (exceeds uint32 max → wraps to a small number)

timer wraps to a small number, they say

  TSTMP_GEQ(4294960000, 4294990000)

they forgot to wrap it there, it should be TSTMP_GEQ(4294960000, small_number)

  = (int)(4294960000 - 4294990000)
  = (int)(-30000)
  = -30000 >= 0 ?  → false!

wrong!

Aloisius2mo ago

There does appear to be a bug, but it's not what the blog describes.

If tcp_now stops updating at <= 2^32 - 30000 milliseconds, then TSTMP_GEQ(tcp_now, timer) will always fail since timer is tcp_now + 30000 which won't wrap.

In cases where tcp_now gets stuck at > 2^32 - 30000, it looks like TCP sockets in the TIME_WAIT will end up being closed immediately instead of waiting 30 seconds, which isn't great either.

2 more replies

mhjkl2mo ago

comex2mo ago

The bug was introduced only last year in macOS 26:

https://github.com/apple-oss-distributions/xnu/blame/f6217f8...

plorkyeran2mo ago

Aloisius2mo ago

Interesting. The article mentions complaints on the forums running Catalina, so that must be something else.

2 more replies

otterley2mo ago· 5 in thread

Sounds like it affects every open TCP connection, not just OpenClaw. (It's pretty rare for a TCP connection to live that long, though.)

josephcsible2mo ago

Individual TCP connections don't need to live that long. Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.

throw0101d2mo ago

> Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.

Current `uptime` on my work MacBook (macOS 15.7.4):

    17:14  up 50 days, 22 mins, 16 users, load averages: 2.06 1.95 1.94

Am I supposed to be having issues with TCP connections right now? (I'm not.)

My personal iMac is at 279 days of uptime.

5 more replies

CamperBob22mo ago

Sure they do. They need to live until torn down.

They almost never do live that long, for whatever reason, but they should.

1 more reply

gpvos2mo ago

Obviously, OpenClaw is now more important than anything else.

mememememememo2mo ago

For OpenClaw this bug is a security feature

tjohns2mo ago· 4 in thread

Does anybody else find these AI-authored blog posts difficult to read? Something about the writing style and structure just feels unnatural, it's hard put my finger on it.

At the very least, the writing takes way too long to get to a point.

dawnerd2mo ago

Same, AI written anything is really difficult for me to read and pretty exhausting.

gowld2mo ago

AI does a good job of condensing the blog post to 2 paragraphs -- Mac refuses to let the tcp_now clock rollover when it exceeds the max value in its data type.

nslsm2mo ago

Use AI to expand your thoughts into a long-winded post, use AI to compress the long-winded post into something that can be digested by a human.

2 more replies

coldtea2mo ago

Can it summarize it down to a non-post?

1 more reply

MatMercer2mo ago· 4 in thread

This made me remember some folks that are "I never reboot my MacOS and it's fine!". Yeah probably it is but I'll never trust any computer without periodic reboots lol.

QuantumNomad_2mo ago

I’m still at where when I connect external hard drive or SSD via USB, use it and then eject it, I shut down the MacBook Pro completely before I unplug the USB cable. Just in case.

Delk2mo ago

> I’m still at where when I connect external hard drive or SSD via USB, use it and then eject it, I shut down the MacBook Pro completely before I unplug the cable. Just in case.

1 more reply

exe342mo ago

$ uptime

22:22:45 up 3748 days 21:20, 2 users, load average: 1.42, 1.36, 1.02

It's very funny, I think it's because my laptop battery died and when I replaced it, it had to update the time from 10 years ago? I'm not sure why, as the laptop is from mid-2012.

jasonjayr2mo ago

> 17:27:20 up 1112 days, 10:36, 50 users, load average: 0.20, 0.19, 0.18

I thought I had a record going here with my Dell laptop, but I guess you win. After a certain point, I just decided to see how long I can make it go.

1 more reply

nottorp2mo ago· 2 in thread

Hmm?

torp@machinename ~ % uptime 11:43 up 59 days, 1:22, 4 users, load averages: 2.87 2.69 2.70

Sleep is disabled on that machine and it definitely had networking working fine last night.

Mac Mini M2, Sequoia.

Incidentally my laptop says 75 days uptime, but that one does go to sleep.

cthalupa2mo ago

> Mac Mini M2, Sequoia.

It's Tahoe specific

https://news.ycombinator.com/item?id=47670995

nottorp2mo ago

So besides ruining the UI they fucked up the kernel too?

WesolyKubeczek2mo ago· 2 in thread

In case of OpenClaw, this is a feature.

4fterd4rk2mo ago

When some Russians do a prompt injection and OpenClaw is threatening to send your NSFW pics to Grandma unless you give it some Bitcoin all you have to do is drag out the negotiations for 49 days!

WesolyKubeczek2mo ago

I’d be afraid the Grandma would send some of her own NSFW pictures right back.

jijji2mo ago· 2 in thread

I thought Alan Cox fixed all the TCP IP bugs in the early 1990s lol

toast02mo ago

Did Alan Cox work on tcp? I thought he was working on memory and stuff.

That's what the wiki says anyway: [1], and a publication with his name is about huge pages [2]

[1] https://wiki.freebsd.org/AlanCox

[2] https://www.usenix.org/legacy/events/osdi02/tech/full_papers...

jijji2mo ago

Alan Cox of course worked on the TCP/IP stack:

Contributions to Linux Kernel

[0] https://machaddr.substack.com/p/kernel-chronicles-insights-a...

[1] https://news.ycombinator.com/item?id=8548738

[2] https://web.archive.org/web/20200923003028/https://www.swans...

1 more reply

netcoyote2mo ago· 1 in thread

toast02mo ago

Yep, everything that relies on overflow needs to overflow soon after start, so that it's well tested.

JensRantil2mo ago· 1 in thread

This reminds me of the Linux kernel scheduler bug that kicked in after 208 days: https://www.claudiokuenzler.com/blog/247/linux-virtual-serve...

bigiain2mo ago

And Boeing 787s

https://airguide.info/boeing-787s-must-be-turned-off-every-5...

fortran772mo ago· 1 in thread

Nobody keeps their Macs running for more than 49.7 days? We have Windows Servers here (with long-term TCP/IP connections) that are only rebooted every 6 months to apply patches.

binaryturtle2mo ago

bawolff2mo ago· 1 in thread

Wasn't windows 95 famous for having an issue like this?

guywithahat2mo ago

Arduino too; I assume they all have to do with storing milliseconds in a uint32_t, and then getting unpredictable behavior when it rolls over

gghootch2mo ago

What does this have to do with OpenClaw exactly?

beanjuiceII2mo ago

i'm on sequoia M1 laptop with uptime 16:38 up 228 days, 21:03, 1 user, load averages: 6.14 5.93 5.64

guess i'm marked safe!

AndroTux2mo ago

Interesting. I think I can confirm this. Got a Tahoe system with 55 days uptime that's mostly idling:

% netstat -an | grep TIME_WAIT | wc -l

850

All other systems with < 49.7 days uptime report low single to double digit numbers.

ingmarstein2mo ago

Thank you for this post! I think I ran into this when running UniFi OS Server (which uses podman) on macOS 26: https://community.ui.com/questions/TCP-connection-leak/2ab61...

poppafuze2mo ago

https://news.ycombinator.com/item?id=41939318

cthalupa2mo ago

dvh2mo ago

Exactly like arduino

apple4ever2mo ago

OH this explains why randomly my iMac would REFUSE to do any connections to anything. I never put together that it was because of uptime!

daveorzach2mo ago

If you want to see exactly when your machine will hit this, I threw together a fish shell function that calculates the precise timestamp, mostly vibe coded.

calc_tcp_overflow_time.fish: https://gist.github.com/daveorzach/64538f82a89fa24e5d134557c...

monitor_tcp_time_wait.fish: https://gist.github.com/daveorzach/0964a7a67c08c50043ff707cf...

nalekberov2mo ago

I rarely restart my Mac mini, and I have never had such an issue beyond my internet provider suddenly stopping properly working in the middle of the night.

NautilusWave2mo ago

How old is this bug? I can't imagine it exists on iOS or iPadOS; have those kernels really drifted that far apart though?

Philpax2mo ago

Ctrl+F "OpenClaw". No results. Que?

apatheticonion2mo ago

Ignoring the AI article contents.

God I wish Apple offered first party support for Linux on Mac computers.

throw031720192mo ago

I only have 11 days left until my machine crashes and I lose all of my tabs.

RyanZhuuuuOP2mo ago

quick update: the problem has been confirmed and resolved in the latest macOS 26.4 release (from Apple)

cute_boi2mo ago

too much words and text for simple thing..... probably written by openclaw

revv002mo ago

Orz! A kindly reminder for rebooting.

awithrow2mo ago

A ticking time bomb? What an overly dramatic way to talk about a bug that requires a reboot. Its not even a hard crash.

j / k navigate · click thread line to collapse