No, Linux is rubbish. Seriously. FreeBSD does this properly.
Edit: FreeBSD, Windows, OSX, Solaris, AIX, HP-UX(?)...
Pretty simple stuff really.
2. while it is being paged into memory, <DO OTHER STUFF>
3. now your data is in memory and you can update it (writes are async by default even on Linux, as the write just goes into memory and will get synced out by the kernels page sync mechanism but you can override that by setting the O_SYNC flag).
That's a reasonably good paper on the trade-offs between event-driven, multi-threaded, and hybrid approaches to file serving.
I don't know that much about nginx in particular, but it seems like they've implemented thread pools for blocking operations: https://www.nginx.com/blog/thread-pools-boost-performance-9x.... "Hard drives are slow (especially the spinning ones), and while the other requests waiting in the queue might not need access to the drive, they are forced to wait anyway." So, if you're blocking reading a file from the hard drive, all the other requests are queued up behind it.
The thread-pool approach noted in the nginx blog sounds pretty much the same as the approach in the linked paper.
nginx does have a good reputation for performance, but I think a lot of that reputation comes as a front-end for web applications rather than serving lots of hard-to-cache files.
Anyway, the nginx blog article as well as the academic paper note that single-threaded event-driven has drawbacks around file io and using a worker pool of threads or processes to offload blocking operations onto can help mitigate that.
https://www.nginx.com/blog/thread-pools-boost-performance-9x...
So the single model work well for redis, but it doesn't work well for nginx, since if there is a request in nginx that is blocking for about 10s, people can't tolerate this situation.
1. The Redis project abandoned attempts to have a mixed memory-disk approach, at least for the near future. I want to focus on trying to do at least one thing well and it is already hard ;-) You know, the no-need-to-konquer-the-world approach. Otherwise the project per se is interesting. Redis Labs has a commercial fork that works that way for instance (which I believe was initially based on the Redis "diskstore" branch I was working on in order to replace the former "virtual memory" Redis feature), but not the OSS side. Maybe I'll change my mind in the future but so far I can't see signs of my mindchange ;-)
2. About threads, we are now a bit more threaded: Redis 4.0 is able to perform deletion of keys in background, Redis Modules have explicit support for blocking operations that use threads, and so forth. However my goal in the next 1/2 years is to finally have threading in the I/O, in order to scale syscals, protocol parsing, to multiple threads but not data access. So regarding the 2006 programming, things will be the same.
Basically I still believe that to do application-side paging now that disks are also faster (ratio compared to RAM) is an interesting approach. I still think that using the kernel VM to do so is a bad idea in general, but could work for certain apps.
Please elaborate. If disk/block-device performance is improving, wouldn't the VM benefit as well?
Also the last sentence seems to make more sense the other way around: VM in the general case, user-land memory management for "certain apps".
About VM in the general case: yes if for general case you mean, a random process is running and is out of memory. If we are talking about in-memory systems wanting to off-load data to disk IMHO the default is that VM does not work well.
Here are the redis configuration notes on VM from redis 2.2:
# Virtual Memory allows Redis to work with datasets bigger than the actual
# amount of RAM needed to hold the whole dataset in memory. # In order to do so very used keys are taken in memory while the other keys
# are swapped into a swap file, similarly to what operating systems do
# with memory pages.
....
# vm-max-memory configures the VM to use at max the specified amount of
# RAM. Everything that deos not fit will be swapped on disk if possible, that
# is, if there is still enough contiguous space in the swap file.
...
# Redis swap files is split into pages. An object can be saved using multiple
# contiguous pages, but pages can't be shared between different objects.
# So if your page is too big, small objects swapped out on disk will waste
# a lot of space. If you page is too small, there is less space in the swap
# file (assuming you configured the same number of total swap file pages).
# If you use a lot of small objects, use a page size of 64 or 32 bytes.
....
# Max number of VM I/O threads running at the same time.
# This threads are used to read/write data from/to swap file, since they
# also encode and decode objects from disk to memory or the reverse, a bigger
# number of threads can help with big objects even if they can't help with
# I/O itself as the physical device may not be able to couple with many reads/writes operations at the same time.
# The special value of 0 turn off threaded I/O and enables the blocking Virtual Memory implementation.
vm-max-threads 4
Let me back up and try to explain a bit:
While OS kernel developers have put a huge amount of effort into virtual memory management and paging, which was and is a good and necessary thing, the definition of "interactive" and "low latency" has changed. Long ago, half-second latency at a virtual terminal connected to a mainframe with hundreds or thousands of users was fantastic, compared with dropping off your stack of punch-cards and coming back 12 hours later.
For most of the software I use and work on today, I want low sub-second latency. It's often only achievable with reasonable direct control of what is in memory and what is on disk. If I click a menu in a GUI program that I haven't clicked in weeks, I don't want to wait half a second for a few scattered pages to be paged in/out of swap. Same goes for requests to web or api servers - I don't want less-common requests to take a half second longer than the typical 50ms or so. For desktop environments, GUIs, databases, caches, services: no swap.
Certainly, data, multimedia files, dictionaries, etc will need to be read from disk. The processes can arrange for separate threads to do that. We can have responsive progress bars, cancel buttons, priorities, timeouts before hitting an alternative data source - but only if the process itself is in RAM, not in swap.
Now that desktop and server systems measure DRAM in 10s of gigabytes, this really should not be hard to achieve!
I've struggled with swap and out-of-memory situations on Linux many times. The linux kernel never seems to OOM-kill processes fast enough for me. If I have no swap, then if memory pressure sets in, the kernel struggles to shrink buffers, practically freezing most processes, for a few minutes before finally killing the obvious culprit. (I've also tried memory-limiting containers, and they suffer the same problem - freeze up for a few minutes instead of immediately killing when OOM.) I used to enable plenty of swap, more than RAM, because that was the common wisdom, but it causes the same problem when the system comes under memory pressure, everything freezes for a few minutes. But it also has the additional problem that despite setting swappiness to 1 or 0, some strange services/applications will cause the kernel to put some anonymous pages in swap, even when there's plenty of free physical memory. I never want that! I need to periodically swapoff and swapon to correct it.
So, at each company I work for, I end up writing a bash script, run by cron each minute, which checks for low system memory, looks among the application services for an obvious culprit, and sends it SIGTERM. In practice, this solves the problem pretty much every time, in the most graceful way. It's extremely rare that a critical system process is the problem or looks like the problem. (Except dockerd a couple times ;)
(This is not to bash Linux in particular, Windows and MacOS use way more RAM and swap in general. I've heard the BSDs have been good at particular things at particular times, but driver support has always been more of a struggle. Besides the swap / OOM behavior, I'm pretty happy with Linux.)
Letting the OS manage disk and RAM makes perfect sense for bulk data processing - hadoop, spark, or other map-reduce or stream-processing where a few seconds pause here and there is no problem if throughput is maximized. But I personally don't work much on those things - and I'm not a rare case.