Re: [PATCH] OOM_pardon, a.k.a. don't kill my xlock (2004) (opens in new tab)

(lwn.net)

91 pointsluu26d ago98 comments

98 comments

61 comments · 12 top-level

rwmj26d ago· 14 in thread

It's 2026 and I still can't configure the OOM killer to kill firefox before anything else.

I looked into this, and actually, it seems like maybe you can? https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj...

So, in actuality, I think your assertion just taught us all something, because despite knowing that the OOM killer and that the Magic SysRq key[1] exists, I didn't know you could configure this as an input!

[1]: https://en.wikipedia.org/wiki/Magic_SysRq_key

rwmj26d ago

I'm aware of it, but it's awkward to use in practice. You have to track down all the FF processes, each time you run it, and adjust all their scores.

4 more replies

3r7j6qzi9jvnve25d ago

If it helps, I run ff in systemd-run with memory limits set -- that's usually enough to avoid the problem in the first place (ff does freeze when loading google spreadsheets or whatever heavy UI, so I also have a script to adjust /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/ff-*.scope/memory.max and memory.high at runtime... I should publish my $bindir someday)

    systemd-run --user --scope --unit=ff-$$.scope \
     -p MemoryMax=4G -p MemoryHigh=3G \
     -p MemorySwapMax=0 \
     firejail firefox "$@"

po1nt25d ago

It would be nice to have a signal as a warning to process to reduce it's memory footprint or else OOM will kill it.

Joker_vD25d ago

You still need some way to make the kernel to send those signals to the processes of your choosing. If the kernel decides to send SIGLOWMEM to xlock instead of firefox, the xlock will get killed because it really doesn't have any memory it can give up.

1 more reply

masklinn25d ago

It’s possible via cgroups, kinda.

cgroups v1 has a pretty nice API but it requires root. V2 does not require root but it’s a lot coarser and not as simple or reliable: https://unix.stackexchange.com/questions/753929/receive-a-me...

NekkoDroid25d ago

https://systemd.io/MEMORY_PRESSURE/

IsTom26d ago

It's not a panacea, but in my case setting browser.tabs.unloadOnLowMemory in about:config helped a bunch.

jolmg25d ago

You can use `earlyoom --prefer firefox --avoid xlock`.

dvh26d ago

This. It's always browser running amok. I configured win+k shortcut key to: killall -9 chrome

SoftTalker26d ago

I always wanted it to target java processes, as they were always the culprit. These days it's python, VSCode, and antigravity.

ars25d ago

Install oomd or earlyoom.

https://github.com/facebookincubator/oomd/

https://github.com/rfjakob/earlyoom

yjftsjthsd-h25d ago

Maybe not in kernel, but running the earlyoom daemon will let you do exactly that in userspace.

silon4225d ago

Or to kill anything at all in this lifetime.

hyperpape26d ago· 11 in thread

I confess, this is very funny and the underlying situation is a bit absurd, but it's unclear what point Brouwer is making by pointing out the absurdity.

There surely is something absurd about having to register specific processes as exempt from the OOM killer. But given that the OOM killer exists, and could kill xlock...how should that be fixed?

kelnos25d ago

I think part of it is that the design of screen lockers on X11 is just broken. If the locker crashes (or is killed), then the screen unlocks. Security-wise, it fails open. On Windows and macOS (and Wayland, using the ext-screen-lock protocol, coupled with sane compositor policy), that can't happen.

The right way for this to work is for the X server to have an extension that lets a screen locker say "hey, I'm locking the screen now", and the X server should respond to that by pretending that the screen locker client is the only client that exists: no other client gets input or gets to draw. And if the screen locker crashes (or is killed), the X server should just put itself into a permanently-locked state where it will never again send any input to anything, and won't ever draw anything except a blank screen. That's not a desirable situation, of course, but it's better than unlocking the screen.

hyperpape25d ago

Admittedly, that's right, and makes sense for that use case. But as others have pointed out, killing the user's web browser while they're using it is equally painful.

ameliaquining25d ago

I read him as arguing that overcommit was a mistake. Of course, he doesn't answer any of the obvious follow-up questions, such as, does fork–exec copy all the process's memory and then immediately throw it away, or what. (One could argue that fork–exec was also a mistake, but it long predates Linux, so this doesn't answer the question of how Torvalds should have designed it.)

zinekeller25d ago

> does fork–exec copy all the process's memory

NT: Yes? Why not?

(note that this refers to the Windows NT kernel's operation because it had historically a POSIX emulation layer (NT Personalities), not the modern WSL which is just Linux in a Hyper-V)

1 more reply

wahern25d ago

> does fork–exec copy all the process's memory and then immediately throw it away, or what

No, you just account for it (commit the charge) in the bookkeeping. If a 1GB process forks, you decrement the amount of free memory by 1GB to ensure other processes don't overcommit such that you won't have 1GB of free memory if and when you actually needed to allocate that memory. If the forked process immediately exits, you just bump the free memory counter back up. This is what Solaris and Windows do.

But precise accounting of memory is difficult if you didn't design for it in the first place. For example, you have to figure in the memory needed for page structures. (Though I think Linux can do that in particular, bugs notwithstanding.) Last time I checked (5+ years ago) Linux was incapable of such precise accounting across the board, so even if you disabled overcommit the kernel could still find itself in an OOM situation when the time comes to allocate memory it already promised or perform an operation it implicitly or explicitly guaranteed it could complete.

The expectation that Linux overcommits meant many Linux kernel developers didn't design subsystems in a way that the kernel as a whole could provide reliable, guaranteed, precise memory accounting. For example, some filesystems rely on being able to use the OOM killer to free up memory needed for an operation that it can't back out of once it starts because it wasn't written in a way that it could either predetermine or bound it's memory requirements, or cleanly back out of an operation it started.

To be fair I'm not sure any of the BSDs can do it either, at least when it comes to fork and CoW. IIRC, nor can macOS, though it will dynamically add swap so you won't get an OOM kill until you run out of disk space.

1 more reply

silon4225d ago

Fork should be replaced by vfork (or something better) in almost all situations.

dooglius25d ago

The point is that the OOM killer shouldn't exist and arguing about how to tweak it is addressing the wrong problem

hackyhacky25d ago

I agree that that's the point he's making, but I don't see how that would work practically. His attitude is that malloc(1<<63) should immediately crash the system, every time? How is that better?

3 more replies

hyperpape25d ago

But the second clause doesn't follow from the first!

I don't think Linux was plausibly going to remove the OOM killer in 2004 or later. So the right solution for Linux is very much to tweak it to be less painful.

sankhao25d ago

I also think the analogy doesn't work. In the plane situation it seems obvious that the luggage should be ejected before passengers, which is what the guy was asking ?

fragmede25d ago

The analogy doesn't work because you can't call fork() on the plane and then it duplicates just the seat for the passenger or pilot that did something different. Also, killing them rather ghastly.

sedatk26d ago· 8 in thread

I’d say, let the one who tried to allocate memory crash, and if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.

LoganDark26d ago

This is only a viable answer when overcommit is disabled. The problem comes when overcommit is enabled and you find yourself in a position where many programs think they already have memory and yet there is none to give them. If you simply kill the first piece of code that encounters the end of available memory you might take down anything including the kernel itself.

Nothing like statically allocating memory can work when overcommit is enabled because the kernel is free to compress memory, page it out and etc. and then murder you the next time you try to perform any operation that it doesn't have the space for, no matter how safe and static your initialization was.

Note that overcommit is very useful in many cases including the ones where swap saves the stability of the system under conditions that would otherwise completely lock up or panic, so it's also not viable to just prevent it from being used.

SoftTalker26d ago

OOM killer always felt like a band-aid on a severed artery to me. I've rarely seen a machine that got into OOM state really recover without a full reboot.

1 more reply

sedatk26d ago

I’m not against taking down the kernel if the situation is that catastrophic. Better than killing the lock screen for sure.

2 more replies

Retr0id26d ago

Statically allocated memory can still OOM on access, due to overcommit and lazy page table population. What you really want is mlockall(2) (probably with MCL_CURRENT|MCL_ONFAULT followed by madvise with MADV_POPULATE_*)

Retr0id25d ago

oops MCL_ONFAULT kinda does the opposite of what I wanted - I think if you omit that you can skip the madvise, and mlockall will populate everything for you.

feelamee26d ago

> if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.

This doesn't save you if someone other allocates and OOM killer chooses you as victim

hkolk26d ago

What is proposed is to not have an OOM killer with a selection process, meaning that the "someone other allocates" would be the one dying.

2 more replies

amluto26d ago

The fact that xlock crashing unlocks an X11 session is, IMO, pathetic.

1 more reply

thomashabets225d ago· 4 in thread

Hey, that's me! (suggesting an OOM pardon feature)

It's a funny reply. But what was not funny was the OOM killer killing my screen locker.

Joke all you want, but 22 years later I still stand by that I'd rather get a kernel panic than kill the screen lock.

These days you can do oom score adjusting, which is not as strong as a pardon. I may be taking too much credit, and may misremember the timeline, but I feel like someone took my crappy kernel patch and went "fine, I'll do it the right way", merged that oom score adjusting maybe a year or so later.

Here's an LWN article about it, too: https://lwn.net/Articles/104179/

jkrejcha25d ago

> These days you can do oom score adjusting, which is not as strong as a pardon.

Writing -1000 to /proc/<pid>/oom_score_adj will cause the OOM killer not to consider the process at all :)

From the man page proc_pid_oom_score_adj(5)

> The value of oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). [...]. The lowest possible value, -1000, is equivalent to disabling OOM-killing entirely for that task, since it will always report a badness score of 0.

creatonez24d ago

The modern desktops seem to have some way to jam themselves if the lock screen fails.

For example, KDE: https://preview.redd.it/plasma-lock-screen-messed-up-v0-zx7h...

GNOME: https://forums.freebsd.org/attachments/index-jpeg.8571/

I think this only works because there is top-down integration between the different parts. The compositor knows when it's supposed to be locked. Whereas the old screen lockers were just very aggressive Xorg apps that suffer from "What if two programs did this?" problems (https://devblogs.microsoft.com/oldnewthing/20110310-00/?p=11...)

Muromec25d ago

>Joke all you want, but 22 years later I still stand by that I'd rather get a kernel panic than kill the screen lock.

An argument can be made that the kernel should not cover for architectural missteps of the X server and that X server should be the one to crash when it's security-critical component was killed for whatever reason.

thomashabets225d ago

Sure. But that's not where we are.

Also there are other safety and security critical reasons why you'd want to exempt some processes.

Arguably (and it definitely has been argued) the real architectural misstep is the Linux kernel overcommitting by default in the first place.

1 more reply

lokar25d ago· 4 in thread

I know this is not a popular / mainstream position, but I managed a very large fleet of systems this way:

- no system swap

- enough memory for core system services set aside in a cgroup for them to use

- by default, all prod service binaries load all code pages into ram at start, and lock them in (no paging out code pages at runtime)

- if needed (rare) services can mount some swap in their own cgroup, but very much discouraged

You need to know how much ram you are going to use, and actually stick to that. Very little is wasted in practice, and you don't have to deal with OOMs all the time. Everything is much more predictable.

xyzzy_plugh25d ago

I agree with your perspective. I certainly agree that swap can be invaluable at times, and is generally a mistake for your run-of-the-mill production services.

It's a nice approach particularly because all OOMs become actionable: there's a bug in a service or a limit is wrong or traffic is changing in an unexpected way.

Systems built this way end up being extremely reliable in my experience.

It's an uphill battle both ways though and not everyone is up for that experience.

tosti25d ago

Have you disabled swap in the kconfig entirely?

If not, is your vm.swapiness 0? How do you deal with overcommit? Did you replace malloc with a more strict implementation?

lloeki25d ago

> How do you deal with overcommit

    echo 2 > /proc/sys/vm/overcommit_memory

lokar25d ago

No swap device

EdSchouten26d ago· 3 in thread

I still remember following Andries’s “Linux kernel hacker’s hut” course he taught at the Eindhoven University of Technology (TU/e) back in 2010. Every week we’d get an assignment where we had to write exploits for commonly occurring security vulnerabilities (e.g., buffer overflows, bad printf format). It was one of the most enjoyable courses I ever followed. Thanks for that, Andries!

blux26d ago

Hey fellow TU/e'er :) I followed his course as well, somewhere around 2004/5. Executing man in the middle attacks, writing buffer overflow exploits. Good memories!

AbbeFaria25d ago

Is this course still available? What about the course materials? I know it will be dated but if so can someone pls share the links. Tried searching for it on google but couldn’t find it.

EdSchouten24d ago

It looks like the code of the course was 2WC16. Unfortunately the course material no longer seems to be available online.

cwillu26d ago· 2 in thread

(2004)

jml7c526d ago

Thanks. I was confused for a bit, given these days you can do

    echo "-1000" > /proc/<pid>/oom_score_adj

to disable OOM killing for a process.

https://github.com/torvalds/linux/blob/master/include/uapi/l...

cwillu25d ago

There's also /proc/sys/vm/panic_on_oom and /proc/sys/vm/oom_kill_allocating_task for other behaviours suggested in the comments.

lelandfe26d ago· 1 in thread

I never pay for the OOF insurance, it seems like a waste of money and I've never met anyone that's had it happen.

keyle26d ago

It can only happen once anyway, and I fly weekly!

nemothekid25d ago· 1 in thread

While I have had my time fighting the OOM killer, I believe overcommit would have always won. To torture the metaphor a bit more, airlines have OOF mechanism - they just eject the overcommitted passengers before the plane takes off.

A passenger buying a ticket is malloc(), but passengers don't always utilize the seat (use the memory). Normally this works out fine, but occasionally, there are too many passengers. Thankfully though instead of executing a couple passengers they give you a voucher.

jkrejcha24d ago

I've mentioned this elsewhere in the thread, but I think it's a difference of view on what malloc represents. Operating systems do have "reserve this part of the address space" APIs and these reservations don't get charged against your commit because you're simply reserving the space, not committing to using it, and so the operating system doesn't need to back it with anything.

In this worldview, malloc is like me buying a plane ticket at the counter for a specific flight that's going to leave soon. I'd be really annoyed if I were bumped off a flight I just paid for (and would've rather been told "that flight is full, try again later" (malloc returns NULL)). This is, for example what Windows does. Under memory pressure, it'll say to applications, "hey no I'm not in a giving mood for memory right now" (and will sometimes bump the size of the pagefile if configured to do this, but only up to a point).

The thought behind this is that well... applications have to handle malloc returning NULL anyway. Whether that's calling abort and giving up is one matter, another might be to retry the allocation at a later time (maybe after Windows has bumped the pagefile size), another might be to handle an error using some preallocated buffer or whatever.

bastawhiz26d ago· 1 in thread

Especially in an era where RAM is so expensive, the obvious answer is to simply never use memory. If your data can't fit in the plethora of CPU registers at your disposal, your software is probably too complicated. /s

throwaway8754325d ago

I see you are an AMD VCACHE enjoyer.

ptx25d ago

FreeBSD has a "protect" command which does something similar to what this asks for – the man page [1] describes it:

"The protect command is used to mark processes as protected. The kernel does not kill protected processes when swap space is exhausted. [...] If you protect a runaway process that allocates all memory the system will deadlock."

[1] https://man.freebsd.org/cgi/man.cgi?query=protect&apropos=0&...

mad_vill25d ago

Happy to see this trending, I probably share this in my company's slack once a month.

j / k navigate · click thread line to collapse

98 comments

61 comments · 12 top-level

rwmj26d ago· 14 in thread

It's 2026 and I still can't configure the OOM killer to kill firefox before anything else.

bellowsgulch26d ago

I looked into this, and actually, it seems like maybe you can? https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj...

[1]: https://en.wikipedia.org/wiki/Magic_SysRq_key

rwmj26d ago

I'm aware of it, but it's awkward to use in practice. You have to track down all the FF processes, each time you run it, and adjust all their scores.

4 more replies

3r7j6qzi9jvnve25d ago

    systemd-run --user --scope --unit=ff-$$.scope \
     -p MemoryMax=4G -p MemoryHigh=3G \
     -p MemorySwapMax=0 \
     firejail firefox "$@"

po1nt25d ago

It would be nice to have a signal as a warning to process to reduce it's memory footprint or else OOM will kill it.

Joker_vD25d ago

1 more reply

masklinn25d ago

It’s possible via cgroups, kinda.

cgroups v1 has a pretty nice API but it requires root. V2 does not require root but it’s a lot coarser and not as simple or reliable: https://unix.stackexchange.com/questions/753929/receive-a-me...

NekkoDroid25d ago

https://systemd.io/MEMORY_PRESSURE/

IsTom26d ago

It's not a panacea, but in my case setting browser.tabs.unloadOnLowMemory in about:config helped a bunch.

jolmg25d ago

You can use `earlyoom --prefer firefox --avoid xlock`.

dvh26d ago

This. It's always browser running amok. I configured win+k shortcut key to: killall -9 chrome

SoftTalker26d ago

I always wanted it to target java processes, as they were always the culprit. These days it's python, VSCode, and antigravity.

ars25d ago

Install oomd or earlyoom.

https://github.com/facebookincubator/oomd/

https://github.com/rfjakob/earlyoom

yjftsjthsd-h25d ago

Maybe not in kernel, but running the earlyoom daemon will let you do exactly that in userspace.

silon4225d ago

Or to kill anything at all in this lifetime.

hyperpape26d ago· 11 in thread

I confess, this is very funny and the underlying situation is a bit absurd, but it's unclear what point Brouwer is making by pointing out the absurdity.

There surely is something absurd about having to register specific processes as exempt from the OOM killer. But given that the OOM killer exists, and could kill xlock...how should that be fixed?

kelnos25d ago

hyperpape25d ago

Admittedly, that's right, and makes sense for that use case. But as others have pointed out, killing the user's web browser while they're using it is equally painful.

ameliaquining25d ago

zinekeller25d ago

> does fork–exec copy all the process's memory

NT: Yes? Why not?

(note that this refers to the Windows NT kernel's operation because it had historically a POSIX emulation layer (NT Personalities), not the modern WSL which is just Linux in a Hyper-V)

1 more reply

wahern25d ago

> does fork–exec copy all the process's memory and then immediately throw it away, or what

1 more reply

silon4225d ago

Fork should be replaced by vfork (or something better) in almost all situations.

dooglius25d ago

The point is that the OOM killer shouldn't exist and arguing about how to tweak it is addressing the wrong problem

hackyhacky25d ago

I agree that that's the point he's making, but I don't see how that would work practically. His attitude is that malloc(1<<63) should immediately crash the system, every time? How is that better?

3 more replies

hyperpape25d ago

But the second clause doesn't follow from the first!

I don't think Linux was plausibly going to remove the OOM killer in 2004 or later. So the right solution for Linux is very much to tweak it to be less painful.

sankhao25d ago

I also think the analogy doesn't work. In the plane situation it seems obvious that the luggage should be ejected before passengers, which is what the guy was asking ?

fragmede25d ago

The analogy doesn't work because you can't call fork() on the plane and then it duplicates just the seat for the passenger or pilot that did something different. Also, killing them rather ghastly.

sedatk26d ago· 8 in thread

I’d say, let the one who tried to allocate memory crash, and if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.

LoganDark26d ago

SoftTalker26d ago

OOM killer always felt like a band-aid on a severed artery to me. I've rarely seen a machine that got into OOM state really recover without a full reboot.

1 more reply

sedatk26d ago

I’m not against taking down the kernel if the situation is that catastrophic. Better than killing the lock screen for sure.

2 more replies

Retr0id26d ago

Retr0id25d ago

oops MCL_ONFAULT kinda does the opposite of what I wanted - I think if you omit that you can skip the madvise, and mlockall will populate everything for you.

feelamee26d ago

> if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.

This doesn't save you if someone other allocates and OOM killer chooses you as victim

hkolk26d ago

What is proposed is to not have an OOM killer with a selection process, meaning that the "someone other allocates" would be the one dying.

2 more replies

amluto26d ago

The fact that xlock crashing unlocks an X11 session is, IMO, pathetic.

1 more reply

thomashabets225d ago· 4 in thread

Hey, that's me! (suggesting an OOM pardon feature)

It's a funny reply. But what was not funny was the OOM killer killing my screen locker.

Joke all you want, but 22 years later I still stand by that I'd rather get a kernel panic than kill the screen lock.

Here's an LWN article about it, too: https://lwn.net/Articles/104179/

jkrejcha25d ago

> These days you can do oom score adjusting, which is not as strong as a pardon.

Writing -1000 to /proc/<pid>/oom_score_adj will cause the OOM killer not to consider the process at all :)

From the man page proc_pid_oom_score_adj(5)

creatonez24d ago

The modern desktops seem to have some way to jam themselves if the lock screen fails.

For example, KDE: https://preview.redd.it/plasma-lock-screen-messed-up-v0-zx7h...

GNOME: https://forums.freebsd.org/attachments/index-jpeg.8571/

Muromec25d ago

>Joke all you want, but 22 years later I still stand by that I'd rather get a kernel panic than kill the screen lock.

thomashabets225d ago

Sure. But that's not where we are.

Also there are other safety and security critical reasons why you'd want to exempt some processes.

Arguably (and it definitely has been argued) the real architectural misstep is the Linux kernel overcommitting by default in the first place.

1 more reply

lokar25d ago· 4 in thread

I know this is not a popular / mainstream position, but I managed a very large fleet of systems this way:

- no system swap

- enough memory for core system services set aside in a cgroup for them to use

- by default, all prod service binaries load all code pages into ram at start, and lock them in (no paging out code pages at runtime)

- if needed (rare) services can mount some swap in their own cgroup, but very much discouraged

xyzzy_plugh25d ago

I agree with your perspective. I certainly agree that swap can be invaluable at times, and is generally a mistake for your run-of-the-mill production services.

It's a nice approach particularly because all OOMs become actionable: there's a bug in a service or a limit is wrong or traffic is changing in an unexpected way.

Systems built this way end up being extremely reliable in my experience.

It's an uphill battle both ways though and not everyone is up for that experience.

tosti25d ago

Have you disabled swap in the kconfig entirely?

If not, is your vm.swapiness 0? How do you deal with overcommit? Did you replace malloc with a more strict implementation?

lloeki25d ago

> How do you deal with overcommit

    echo 2 > /proc/sys/vm/overcommit_memory

lokar25d ago

No swap device

EdSchouten26d ago· 3 in thread

blux26d ago

Hey fellow TU/e'er :) I followed his course as well, somewhere around 2004/5. Executing man in the middle attacks, writing buffer overflow exploits. Good memories!

AbbeFaria25d ago

Is this course still available? What about the course materials? I know it will be dated but if so can someone pls share the links. Tried searching for it on google but couldn’t find it.

EdSchouten24d ago

It looks like the code of the course was 2WC16. Unfortunately the course material no longer seems to be available online.

cwillu26d ago· 2 in thread

(2004)

jml7c526d ago

Thanks. I was confused for a bit, given these days you can do

    echo "-1000" > /proc/<pid>/oom_score_adj

to disable OOM killing for a process.

https://github.com/torvalds/linux/blob/master/include/uapi/l...

cwillu25d ago

There's also /proc/sys/vm/panic_on_oom and /proc/sys/vm/oom_kill_allocating_task for other behaviours suggested in the comments.

lelandfe26d ago· 1 in thread

I never pay for the OOF insurance, it seems like a waste of money and I've never met anyone that's had it happen.

keyle26d ago

It can only happen once anyway, and I fly weekly!

nemothekid25d ago· 1 in thread

jkrejcha24d ago

bastawhiz26d ago· 1 in thread

throwaway8754325d ago

I see you are an AMD VCACHE enjoyer.

ptx25d ago

FreeBSD has a "protect" command which does something similar to what this asks for – the man page [1] describes it:

[1] https://man.freebsd.org/cgi/man.cgi?query=protect&apropos=0&...

mad_vill25d ago

Happy to see this trending, I probably share this in my company's slack once a month.

j / k navigate · click thread line to collapse