Until this is fixed, I'll just keep running my systems with very small amounts of swap (say, 512MB in a system with 16GB of RAM). I'd rather the OOM killer kick in than have to REISUB or hold down the power button.
Some benchmarks with regards to the performance claims would be nice.
Yeah, this is basically the main drawback of swap. I tried to address this somewhat in the article and the conclusion:
> Swap can make a system slower to OOM kill, since it provides another, slower source of memory to thrash on in out of memory situations – the OOM killer is only used by the kernel as a last resort, after things have already become monumentally screwed. The solutions here depend on your system:
> - You can opportunistically change the system workload depending on cgroup-local or global memory pressure. This prevents getting into these situations in the first place, but solid memory pressure metrics are lacking throughout the history of Unix. Hopefully this should be better soon with the addition of refault detection.
> - You can bias reclaiming (and thus swapping) away from certain processes per-cgroup using memory.low, allowing you to protect critical daemons without disabling swap entirely.
Have a go setting a reasonable memory.low on applications that require low latency/high responsiveness and seeing what the results are -- in this case, that's probably Xorg, your WM, and dbus.
I think there is something wrong with some of the major distros. I got really fed up with Ubuntu because of random junk running without my approval and eventually migrated to Arch simply because I have a lot more control over configuration. I don't mean to trash one distro over another because each one has its strengths and weaknesses, but I'm been surprised at how bloated the average Linux install is these days. I'd love it if there was more attention paid to it.
With an 8 Gig stick in my NUC, for normal desktop usage it never goes above 3.
And it caused huge problems for me, would run out of swap while having plenty of free memory and then go cripplingly slow.
Banning swap is like making self-storage companies illegal and forcing everyone to hold all possessions in their homes. Sure, you'd be able to get to grandma's half broken kitschy dog coaster that you can't bring yourself to throw away, but you'd also be harder to harder to fit and find your own stuff, the stuff you need all the time.
If you find yourself driving to and from the self storage place every day, you probably need a bigger home. But self storage is plenty useful even if you almost never visit it.
To extend the analogy: what do you do if grandma comes and fills your house with stuff? You need space to work, so you go and drop it off at the self storage place, but what if she just keeps filling your house up?
The OOM killer will do absolutely nothing until both your house and the whole self storage place are totally full. By that point, you've spent a huge amount of time just driving to and from self storage, so you haven't had time to do any actual work; it would probably have been better to tell grandma that you don't want any more stuff once she filled up your house for the first time.
Anyway, I agree with you that this behavior is annoying, but I think it ought to be possible to fix it (e.g., with memory cgroups or something like Android's lmkd) without giving up on the idea of spilling infrequently-accessed private dirty pages to disk.
As for the analogy -- there are metrics you can use today to bat away grandma before she starts hoarding too much. We have metrics for how much grandma is putting in the house (memory.stat), at what rate we kick our own stuff out of the house just to appease grandma, but then we realise we removed stuff we actually need (memory.stat -> workingset_refault), and similar. Using this and Johannes' recent work on memdelay (see https://patchwork.kernel.org/patch/10027103/ for some recent discussion), it's possible to see memory pressure before it actually impacts the system and drives things into swap.
Hmm. I read the article and I think I understood it. However, in my experience, you run out of RAM if and only if your working set is too big. In my experience, all involved find it desirable to reduce the size of the working set as quickly as possible. Your experience seems to differ.
> The point isn't working with data sets larger than RAM. The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch.
Your reasoning is too sloppy. It supports neither your blanket statements nor your pained analogy.
You appear to presuppose that:
(1) The kernel can predict which pages the user will "almost never touch."
(2) Mispredicting which pages will be "almost never touched" is of relatively low cost.
(3) Swapping pages that the user will "almost never touch" to disk frees up an appreciable amount of RAM.
(4) When pulling those pages back from disk, the work held up is, on average, less important than whatever we got to do with the RAM in the meantime.
I disagree with (1). Like I said elsewhere in the comments on this article, the kernel cannot reliably predict whether a process will "almost never touch" a given page. The kernel does not have sufficiently detailed knowledge of the process's purpose or access patterns.
I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad. When lots of mispredictions happen in a tight cluster, the kernel and all running processes will be stopped when the user forcibly bounces the machine. If you let the OOM killer run instead of swapping, the kernel stays up and only a few running processes die. Having a working set whose size is larger than RAM but smaller than RAM + swap seems to be a recipe for a very long cluster of such mispredictions and a human intervention.
I am curious to hear about workloads where (3) occurs. (Non-latency-sensitive Java code that doesn't churn objects too fast? You've allocated a heap of a certain size, and the half or so that's free doesn't get disturbed too much.)
Regarding (4), even if the kernel could reliably predict cold pages, "page will almost never be touched" isn't necessarily the right criterion for swapping a page to disk. What if reading from the page will be on the critical path for something users do care about, such as logging in and killing a misbehaving process?
> I disagree with (1). [...] The kernel does not have sufficiently detailed knowledge of the process's purpose or access patterns.
You're in for quite a surprise, particularly on desktop. I have a number of processes with some pages swapped out, and I see no impact on interacting with the said processes. Firefox, gDesklets, a volume changer, and several instances of rxvt are among them.
> (2) Mispredicting which pages will be "almost never touched" is of relatively low cost.
> I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad.
Only in the case of repeated mispredictions, which only happens if you really have low RAM and are on a good way to invoke OOM killer anyway. With (1) being quite accurate (mainly because swapping out unused pages is not that aggressive), (2) magically becomes true as well.
With self-storage rising to over $300 per month, it's more cost effective to take the stuff to the dump and buy it again if it is ever needed.
I started to read the article, and then thought, "I know this, who doesn't know this?" and stopped.
"The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch."
Exactly. Who with any technical experience in this day and age doesn't understand that. Are there really people trying to argue against swap?
You're on a site infamous for the comment "I switch to Node when I want to be close to the metal".
" Under no/low memory contention
[...]
Without swap: We cannot swap out rarely-used anonymous memory, as it’s locked in memory. While this may not immediately present as a problem, on some workloads this may represent a non-trivial drop in performance due to stale, anonymous pages taking space away from more important use."
Now imagine that I have no memory contention. In other words I've got 8 Gigs of memory and I have never run out of memory. The OOM killer has never run. I've never even come close. How exactly is this representing a non-trivial drop in performance?
To be fair, if I put some of my long running processes into swap, I could cache more files, but I really don't see how this represents a statistically significant improvement. I honestly can't think of anything else.
If you sometimes run out of memory (or even get close), then you should have some swap. This seems fairly obvious to me. Relying on the OOM killer to "clean things up" is pretty dubious. But was there every any serious argument to do this? I've literally never heard of that before.
I'd be very happy to hear something enlightening about this, but I didn't see anything in the article (perhaps I missed it).
Why does that seem obvious to you? With swap, running low on memory is game over. Without swap, the OOM killer runs. You can call the OOM killer dubious, graceless, or any number of other things, but it gets the system responsive again without doing as much damage as the human intervention that's otherwise required.
I understand from the land of JIT compilers, garbage collectors, and oversubscribed everything that this is not much of a substantial concern as these features are already traded away.
The swap may be the best case in a bad situation. I would argue along the lines of don't be in a bad situation...
I'm looking at you 8 of 16 GB used on cold boot Mac laptops... Looking at you with indignation and rancor Chrome.
The solution for your rarely-used but response time critical daemon is for it to mlock() its critical data and code pages into memory, which works regardless of whether or not you have swap available. (Or, alternatively, use one of the cgroup controllers that the article alludes to, to give the critical daemon and related processes memory unaffected by memory pressure elsewhere in the system).
Essentially having no swap is similar to having everything mlocked - no major faults can happen except with mmapped files which will just use direct disk IO.
If you mean disk caches, when have you seen a multigigabyte executable?
Somehow that doesn't resonate with my experience. I tend to remember the cases where I can't even SSH into the box, because the fork in sshd takes minutes, as does spawning the login shell.
I'd really like some way to have swap, but still loosen the OOM killer on the biggest memory hog when the system slows down to a crawl. I haven't found that magic configuration yet.
As for the problem with SSH and login: You might well find that it is not the fork that is the problem. You might well be surprised at how much chaff is run by a login shell, or even by non-login shells.
A case in point: I recently reduced the load on a server system that involved lots of SCP activity by noticing that, thanks to RedHat bug #810161, every SCP session even though it was a non-login non-interactive shell was first running a program to enumerate the PCI bus, to fix a problem with a Cirrus graphics adapter card that the machine did not have on a desktop environment that the machine did not have. This driven by /etc/bashrc sourcing /etc/profile.d/* .
* https://github.com/FedoraKDE/kde-settings/blob/F-26/etc/prof...
As the author notes much of this has been improve by cgroups, and there's always been big hammers like mlock(), even with those things it can be hard to prevent memory thrashing in extreme cases. I've seen swap disabled completely by people who understood how it worked as a last result because of that.
It's always seemed to me that this was mainly a problem of the kernel configuration being too opaque. Why can't you configure on a system-wide basis that you can use swap e.g. only for anonymous pages and for nothing else?
Similarly it would be nice to have a facility for the OOMkiller to call out to userspace (it would need dedicated reserved memory for this) to ask a configured user program "what should I kill?". You might also want to do that when you have 10G left, not 0 bytes.
Edit: sorry, not two syscals, it's an option to the malloc-equivalent - VirtualAlloc
This is for academic use only. I know how much RAM my machine has, and if I oom, it usually isn't because I tried to squeeze in just a tiny bit too much data, but rather because I made some stupid mistake and keep allocating small chunks of memory very rapidly. On a system with even a moderate amount of swap, this makes everything grind to a halt, and it is usually much faster to just reboot the machine and deal with the problems later in the unlikely event that rebooting actually causes problems.
I also think that a convincing case for swap would have to discuss the concepts of latency, interactivity, and (soft) real-time performance, things that largely weren't to the fore in the salad days of the 370 family or the VAX. Virtual memory is the TCP of local storage.
The article actually says, four times over, that it should not be thought of as emergency memory. It's not emergency memory; it's ordinary memory that should see use as part of an everyday memory hierarchy.
And if you are going to question the terminology, the elephant in the room that you have missed is calling paging swapping. (-:
> many people just see it as a kind of “slow extra memory” for use in emergencies
the scare quotes are around 'slow extra memory' not 'emergencies'. Now granted in the last bullet point of the conclusion it affirms that VM is a source of slow memory, but earlier it uses 'memory' where it's referring specifically to RAM, for example
> Without swap: Anonymous pages are locked into memory as they have nowhere to go.
Really the main reason my original comment was rubbsih is that I took the article far too much as a general discussion of swap when, as it said, it's largely about how much swap to enable on a given Linux system running some already-determined software.
Now, I've replaced the SSD and installed a non-Google Linux distro, and would like to limit the amount of swapping Firefox can do.
I had been planning to simply use cgroups' memory features to limit the amount of memory consumed by Firefox processes, but if I am to understand the article (which I admit I didn't read in full detail), I should also be able to tune swapping to limit the actual amount of swapping that takes place, avoiding a drastic uptick in SSD wear whenever open too many tabs.
That, and perhaps a Firefox extension that suspends background tabs in memory (which I've used before with a certain amount of effectiveness in the pre-WebExtension days).
ChromeOS uses zram instead of physical swap, which works quite well, even on 2GB models. Zram is available in any Linux distro, being built into the kernel, and is also the default configuration in GalliumOS (Ubuntu+Xfce for Chromebooks, most of which are less broadly compatible than your PEPPY).
about:memory has various options, including a 'minimize memory usage' button and profiling tools.
about:preferences has Privacy & Security > Cached Web Content > Override automatic cache management (select and set at 500MB, 1GB, or whatever works best).
Setting a lower value (min is 0, default is 60 iirc, max is 100, so above half) reduces the likelihood of the kernel swapping. The lower the number the fuller ram needs to be before the kernel will start swapping. Hence a low number will mean that less swapping will happen, hence meaning less SSD wear.
Along with a hard limit set by cgroups so that browser tabs start being killed in order to stop the swap from being overwhelmed when memory is being used well beyond its capacity.
(I'd be interested to know if 'swappiness' already effectively implements the following system-wide, but here goes.)
Now that I think of it, it's not so much the quantity of disk storage being consumed by swap, so much as the number of write / delete transactions. One thus wonders if an approach in which the browser somehow favors a small number of tabs to keep in ram, and then dumps the state of the remaining tabs to disk, might be just as effective, but without worrying about the growth of swap. Then, when the user opens a tab that had been saved to disk, if we crudely assume that the memory consumed by open tabs are roughly comparable in memory use, then we can take care of the whole affair of 'swapping' in a single exchange between memory and disk, where we dump the state of the in-memory tab in order to make way for restoring the saved one.
(Or did I just reinvent what swappiness does already? From your description of swappiness, I'm inclined to guess the answer is no. This approach strikes me more akin to using the swapfile as a filesystem, and keeping just a small number of tabs paged into RAM.)
In the Pentium 1 era EDO RAM maxed out at 256MB/s and hard disk xfer was 10MB/s. Common RAM size was 16MB.
In today's era DDR4 maxes out at 32GB/s and hard disk xfer is 500 MB/s. Common RAM size is 16GB.
RAM xfer rate has grown is 320x. RAM capacity has grown 100x. Disk xfer rate has grown 50x.
Swap is no longer a useful tool.
Swap has a lot less purpose in a world without memory leaks and extraneous functions. But in practice it's quite good at getting several gigabytes of unnecessary data out of the way, so ram can be used properly.
Swap, well-used, should only take up a few percent of the drive's bandwidth.
You can't detect a priori whether data is "unused." If you guess wrong a few times in a row, you get the familiar pattern where your Linux box is unresponsive to everything and needs to be bounced.
If you could detect whether data is rarely used, swap still isn't necessary. Applications can mmap() a file and use that region for "rarely used data" if such is known in advance.
Extraneous functions should be backed by the executable in the common case. In the JIT case, they probably won't be JITted anyway.
I still think the OOM killer is less intrusive than swapping to disk. It kills some, but not all, of the processes on the machine get killed. The system pretty reliably comes back to life in less time than it takes a human to diagnose the problem and bounce the system. As a bonus, no human needs to get involved.
When dealing with swap, continuous transfer rate of the hard disk is not the relevant metric; seek time is.
In my experience, when the system starts needing to read a lot of pages back in from swap, it tends to do so in more or less random order. It reads a small amount of data, then it seeks to a new position, then reads another, and so on.
(I also find the argument unconvincing for a separate reason: I think hard drives are somewhat obsolete. Even my home system has flash instead of swap. And flash has a massively better seek time.)
One of those times was on a 512MB RAM VPS, where I needed to compile something - you don't want a 512MB VPS to do a lot of compilation, but in that one instance, I was very glad I could easily just make a swap file and get on with it. The other time was on my laptop with 8 GB of RAM.
Also, even ignoring the times where swap makes possible a task which would otherwise have been impossible, you flat out ignored the content of the article. Did you even read it? If you have a long running task which allocates a lot of memory, then proceeds to only very rarely use that memory (or maybe it only needs that memory when it shuts down, or just forgot to free that memory), swap allows the system to swap out that memory and instead do something useful with it, like caching files or not killing processes. It doesn't matter that the disk is slower than RAM, if the swapped-out memory is rarely or never accessed.
hard disk xfer is 500 MB/s
You need better disks. The one in the laptop I'm typing this on can do 2GB/s, which greatly weakens your comparison.A better statistic to use is how many IOPS the disk can handle.
Yes, hibernation does work well and it requires swap. Personally, I set a swap partition equal to RAM + 512MB on systems that I want to hibernate on.
Linux also supports swap files and this might be handy: https://wiki.debian.org/Hibernation/Hibernate_Without_Swap_P...
Thus I disabled swap and I never had these unresponsive issues. I run with 32GB of ram so generally well behaved applications never run into memory issues.
Some applications that would cause issues would be too many VirtualBox instances that use more than available memory. A text editor that chocks trying to open a >1GB text file (looking at you, the new JS-based editors.)
The difference with swap is that the computer doesn't get unresponsive, it just slows down a bit. And Ram compression still buys some time before OOM killer hits.
fallocate -l 8G /swapfile
chmod 0600 /swapfile
mkswap /swapfile
swapon /swapfile
Add an entry in /etc/fstab & you're done. "This little trick" made all the difference on a compute cluster I managed, where each node contained 96G of RAM. It's much more pleasant to monitor swap usage than the OOMKiller and related kernel parameters.Or, worst case, if two tasks that need 8GB each are running on a machine with 8GB memory, kill one, let the other finish, and restart the first one. Or, less ambitious, freeze one, swap it out to disk, let the other finish, and only then resume the frozen app.
Desktop OSs are so primitive at memory management, forcing the complexity onto the user.
Once desktop systems and applications support required APIs to handle saving state before being shut down.
SIGTERM and friends? :-)
If your application is just dropping state on the floor as a result of having an intentionally trappable signal being sent to it or its children, that seems like a bug.
You can do this with eBPF/BCC by using funclatency (https://github.com/iovisor/bcc/blob/master/tools/funclatency...) to trace swap related kernel calls. It depends on exactly what you want, but take a look at mm/swap.c and you'll probably find a function which results in the semantics you want.
But if you DO have swap space, there won't be a performance hit (at least not under Linux) because it will only swap out some rarely used pages and then sit there doing nothing.
So, in the general case, it's better to have it and not need it than need it and not have it.
No, that's the opposite. If you have enough memory for your workload that you won't run out, swapping lets you use more memory for disk cache (instead of keeping unaccessed anonymous pages in real ram).
Unless by "won't run out" you mean "never have to throw away a disk cache page", which seems very unrealistic.
Except that it doesn't happen in practice, on my systems anyway. If you have plenty of memory, you can keep all your programs in it and as much as the system wants to cache and still not run out.
The theory says that swap effectively buys you some memory to spend on more important things (than what the system chooses to page out). So does buying more memory.
> Unless by "won't run out" you mean "never have to throw away a disk cache page", which seems very unrealistic.
I have an instance of top running on my desktops & laptops all the time. I never see cache using up all of the memory.
total used free shared buff/cache available
Mem: 7848 1523 5827 53 497 6033I fixed this for you. At least there are still some OSes that don't consider this a "solution."
-PCs have a lot of RAM now
-When you allocate that much memory it's usually a bug in your own code like a size_t that overflowed. I never saw programs I would actually want to use try to allocate that much
-When using swap instead of ram, everything becomes so slow that you're screwed anyway. The UI doesn't even respond fast enough to kill whatever tries to use all that memory.
-How common is a situation where you need more memory than your ram size yet less than ram+swap size in a useful way? Usually if something needs a lot, it's really lot (and as mentioned above not desirable)
-Added complexity of making extra partition
-Added complexity if you want to use full disk encryption
-I do the opposite of using disk as ram: I put /tmp in a ramdisk of a few gigs
-Disks are slow and fast ssd's are expensive so you would't want to sacrifice their space (maybe if this changes some day...)
I imagine this would solve the full disk encryption complexity too.
- Macbook Pro's don't have a lot of RAM. Still. (Let's not get into that whole discussion as valid as it may be, that is far from the biggest issue I have with current macbooks...)
- Macbook Pro's have no user facing complexity to use swap or to have it encrypted. Heck, they even give you a middle ground as well by default of compressed RAM.
- I have to run a lot of services to run our development stack. Most of the time I'm only using a few but don't want to have to go manually start and stop services all the time. Also, even within a service typically access to most of the memory isn't required for every request. Swap handles this quite well.
- There are parts of our development stack that, simply put, are extremely bloated in terms of memory use. Four gig webpack process, I'm looking at you. Yup, that is ridiculous and should be fixed and maybe there is a fix out there we haven't figured out ... but I don't care, isn't my problem, and I don't have to fight it because it seldom accesses most of that bloated memory so it lives great in swap.
- I like being able to switch to working on something else without having to shut down and restart all the applications I'm using as required for the new task. Swap is a great fit for that. For example, if I have to switch from developing code to analyzing a large Java heap dump, which requires lots of memory, I don't have to go shut everything down, it just quickly gets paged out to swap and comes back when I need it. I don't care if it takes an extra 30 seconds for switch between these two tasks, it already takes much longer than that to load the heap dump regardless.
I think often people give swap a bad name because they never see what it does well for them unless they go looking for it, they only see it when they are asking their machine to do something where the working set is actually too big for memory and blame swap.
That said, no need for swap if you can live fine without it. I just don't think that is good general advice.
What about applications that have memory-bound performance characteristics? In these cases, saving a bit of memory often directly translates into throughput, which translates into $$$.
This isn't a theoretical, a bunch of services which I've run in the past and currently literally make more money because of swap. By using memory more efficiently and monitoring memory pressure metrics instead of just "freeness" (which is not really measurable anyway), we allow more efficient use of the machine overall.
It's hardly used, around 300MiB at present, probably things like the mail daemon. It's been useful to have a very slow node, which I can SSH into (after 10 minutes) and kill a chosen process, rather than a dead/OOMed node. But I think the difference is marginal, and perhaps 512MiB would have been a more appropriate size for the partition.
(Swappiness is set to 1.)
The author discusses the situation as if the quantity of RAM is fixed and swap can be added (or not). But that isn't the only possibility — you can also add more RAM (it's just expensive). For the same number of GB of RAM+swap vs just RAM, there is no reason to prefer the option with swap.
In the end the core idea is: sometimes you have anonymous memory that is accessed so rarely that you'd rather have an extra disk cache page. If you assume that the kernel is not paging out memory that you actually use when not under pressure, swap doesn't hurt you.
If you don't need it, you don't need it. The other question is: how much swap exactly should I have? And why wouldn't I just add that much RAM instead?
> In the end the core idea is: sometimes you have anonymous memory that is accessed so rarely that you'd rather have an extra disk cache page.
That's the theory. In practice I always have more than enough RAM for all the cached pages the system wants to cache. On my laptop right now (booted today), I have 500MB of cached pages and 5.8 gigabytes of free memory. On my server (booted 499 days ago) I have 700MB of cached pages and 6 gigabytes of free memory.
If I were running out of memory [be it for cache or applications], I'd prefer to have more RAM than add swap. Yes, I keep calling it emergency memory.
> If you assume that the kernel is not paging out memory that you actually use when not under pressure, swap doesn't hurt you.
1) Bad assumption 2) it doesn't help you either, so why bother? Actually I might have a use for that disk space. In that case the swap just hurts.
Or just impossible
Many laptops still have remarkably low maximum-RAM limits. The ones I have here ( Dell SME & corporate types ) are 4GB and 8GB. I live in constant fear of the solid-blue disk-access lamp.
And when I boot-up after a swap-thrash I am scolded for an unclean shutdown :(
One counter example:
If your processes in sum tend to A) access many disk locations, at large total disk space B) hold a lot of underused data in ram
This isn't totally impossible. Maybe an Ethereum node with a script doing a bunch of data reads, running side by side with hundreds of Chrome tabs, few of which are regularly accessed. (Totally hypothetical, of course...)
Swapping some rarely used ram out so the OS can buffer disk into ram seems like a reasonable approach (although maybe even more reasonable is: close some tabs).
Your point still stands that more ram is strictly as good or better performance in this scenario, but you might be able to get an equivalent performance boost much more cheaply with some swap space for the underused ram. Also, upgrading past 32 GB ram starts to veer from expensive to impossible on a laptop.
I wonder if instead of using 4GB RAM, I'd setup 3GB RAM + 1GB swap space in RAM disk would result in much wiser OOM killer decisions (or stability).
Showing superiority of such configuration would probably convince all swap sceptics
They are slower, but to give every branch a fully usable test system is pretty awesome. No reason to pay through the butt for RAM for tier-1 dev environments. You can also have a premium dev environment for the develop branch on a different server.