undefined | Better HN

0 pointswahern4y ago0 comments

> It would be like a client getting an HTTP 502 from a load-balancer and the service invisibly restarting.

If the load-balancer can't gracefully handle malloc failure than instead of a 502 response, hundreds, thousands, or perhaps even millions of clients will simply get EOF or possibly even hang indefinitely.

Same principle applies to the kernel--if the kernel can't gracefully handle malloc failure, things can quickly become much more unpleasant than a spurious 502 somewhere.

If you've never had to worry about malloc failure, it's because other people have. Failing gracefully under pressure is difficult, so we tend to push those chores into a small number of software and hardware services. But the people writing those solutions still need the software stack beneath them to provide the ability to handle things as cleanly as possible.

Imagine writing an ACID database if a failed disk operation crashed the entire machine. Yes, you can work around it, but actual QoS would suck at scale compared to being able to isolate the fallout to the particular transaction.

Linux' overcommit absolutely has caused me and others countless hours of grief. Because overcommit blunts, obscures, and redirects memory back pressure, things tend to soft fail or timeout in cascades across completely unrelated services, and you really have very little ability to control it in any meaningful manner.

0 comments

1 comments · 1 top-level

karmakaze4y ago

I should have been more clear, if my service crashed and was located behind a load-balancer the client would receive a 502 Bad Gateway response. That's based on the premise that I'm not writing the load-balancer and that it's been well tested and doesn't suffer from ungraceful handling of malloc failure. So yes it's important for a small subset writing things like load-balancers.

j / k navigate · click thread line to collapse