With virtual memory and paging, it's really up to the user what's too taxing on their system. And it's not an either-or, I greatly value an application that reliably saves its state consistently and often. Sublime Text is fantastic, I don't even have to press Save and can just pull the plug on the machine. This mitigates the allocation failure case as well as so many other failures.
I can't recall a time when I thought if this app/service had better handled allocation failure, things would be great. Being out of memory from the operation was equivalent to me as a nice error message that it could resume from not having done the operation. It would be like a client getting an HTTP 502 from a load-balancer and the service invisibly restarting. Maybe it's my cattle not pets attitude to expect failures.
If the load-balancer can't gracefully handle malloc failure than instead of a 502 response, hundreds, thousands, or perhaps even millions of clients will simply get EOF or possibly even hang indefinitely.
Same principle applies to the kernel--if the kernel can't gracefully handle malloc failure, things can quickly become much more unpleasant than a spurious 502 somewhere.
If you've never had to worry about malloc failure, it's because other people have. Failing gracefully under pressure is difficult, so we tend to push those chores into a small number of software and hardware services. But the people writing those solutions still need the software stack beneath them to provide the ability to handle things as cleanly as possible.
Imagine writing an ACID database if a failed disk operation crashed the entire machine. Yes, you can work around it, but actual QoS would suck at scale compared to being able to isolate the fallout to the particular transaction.
Linux' overcommit absolutely has caused me and others countless hours of grief. Because overcommit blunts, obscures, and redirects memory back pressure, things tend to soft fail or timeout in cascades across completely unrelated services, and you really have very little ability to control it in any meaningful manner.
It worked great, until the mysql dataset grew enough. There was a backup job running 7z and pulling the compressed DB to a storage machine. It turns out 7z crashed because of the OOM killer.
The http service itself just kept going on, OOM or not. Presumably because backup ran at 3AM and almost nobody was using it at that time.
No the unused memory is not yours to manage. Return it to the OS.
For example, you fork() then exec() from a process using 16GB of memory: without overcommit, you briefly need 32GB of memory.