What's wrong with 2006 programming? (2010) (opens in new tab)

(oldblog.antirez.com)

123 pointsbaotiao9y ago52 comments

52 comments

24 comments · 8 top-level

smegel9y ago· 7 in thread

> OS paging is blocking as hell

No, Linux is rubbish. Seriously. FreeBSD does this properly.

Edit: FreeBSD, Windows, OSX, Solaris, AIX, HP-UX(?)...

doublerebel9y ago

The way SmartOS performs under memory pressure is completely different from Linux, the OS is still usable where Linux would be completely frozen. Admittedly I don't know the underlying implementation behind this feature.

trungaczne9y ago

Do you have any articles that talk about how FreeBSD does memory management differently?

smegel9y ago

https://people.freebsd.org/~jlemon/papers/kqueue.pdf

1 more reply

smarnach9y ago

What else can you do other than blocking until the page has been loaded? How would it be possible to resume a single-threaded process while the memory it's trying to access is not available?

smegel9y ago

You say "hey process, this data is not available yet, but if you listen on this event port I will sent you a notification when it is ready, then you can do whatever you want with it. In the meantime, continue to serve up data that is in memory".

Pretty simple stuff really.

1 more reply

Dylan168079y ago

Please tell me how it's possible to do something like socket.write(data), when data has been swapped out, without blocking.

smegel9y ago

1. make a non-blocking read on the data

2. while it is being paged into memory, <DO OTHER STUFF>

3. now your data is in memory and you can update it (writes are async by default even on Linux, as the write just goes into memory and will get synced out by the kernels page sync mechanism but you can override that by setting the O_SYNC flag).

3 more replies

camtarn9y ago· 5 in thread

Since the article doesn't mention it til quite far in: Redis is apparently single threaded, which is why the blocking nature of OS page swapping is so disastrous. Presumably for a more traditional server with lots of worker threads this would be less true.

tyingq9y ago

It does make the conversation interesting, as the "varnish guy" sort of subtly suggests that single threaded is subpar. Which seems odd, given that nginx is single threaded, and in a somewhat similar space to varnish...and seems to enjoy a good reputation for performance.

mdasen9y ago

http://www.cs.princeton.edu/~vivek/pubs/pai_flash_99.pdf

That's a reasonably good paper on the trade-offs between event-driven, multi-threaded, and hybrid approaches to file serving.

I don't know that much about nginx in particular, but it seems like they've implemented thread pools for blocking operations: https://www.nginx.com/blog/thread-pools-boost-performance-9x.... "Hard drives are slow (especially the spinning ones), and while the other requests waiting in the queue might not need access to the drive, they are forced to wait anyway." So, if you're blocking reading a file from the hard drive, all the other requests are queued up behind it.

The thread-pool approach noted in the nginx blog sounds pretty much the same as the approach in the linked paper.

nginx does have a good reputation for performance, but I think a lot of that reputation comes as a front-end for web applications rather than serving lots of hard-to-cache files.

Anyway, the nginx blog article as well as the academic paper note that single-threaded event-driven has drawbacks around file io and using a worker pool of threads or processes to offload blocking operations onto can help mitigate that.

3 more replies

kelnos9y ago

nginx is a forking server, though, so individual workers being blocked wouldn't affect others, and the application as a whole can use all available CPU cores.

1 more reply

smegel9y ago

It is subpar on Linux. On an engineered OS like FreeBSD it is amazing. There is a reason Netflix/Cloudflare/more run all their nginxs on BSD.

https://www.nginx.com/blog/thread-pools-boost-performance-9x...

baotiaoOP9y ago

I think the different between redis and webserver like nginx is that all the operations in redis is almost the same, it is about less than 1ms. However the request to nginx fall in a widely range, some request need 10ms, while some request need 10s. Since nginx need do some file operations.

So the single model work well for redis, but it doesn't work well for nginx, since if there is a request in nginx that is blocking for about 10s, people can't tolerate this situation.

antirez9y ago· 2 in thread

Hello, a few things that changed in the latest 6-7 years:

1. The Redis project abandoned attempts to have a mixed memory-disk approach, at least for the near future. I want to focus on trying to do at least one thing well and it is already hard ;-) You know, the no-need-to-konquer-the-world approach. Otherwise the project per se is interesting. Redis Labs has a commercial fork that works that way for instance (which I believe was initially based on the Redis "diskstore" branch I was working on in order to replace the former "virtual memory" Redis feature), but not the OSS side. Maybe I'll change my mind in the future but so far I can't see signs of my mindchange ;-)

2. About threads, we are now a bit more threaded: Redis 4.0 is able to perform deletion of keys in background, Redis Modules have explicit support for blocking operations that use threads, and so forth. However my goal in the next 1/2 years is to finally have threading in the I/O, in order to scale syscals, protocol parsing, to multiple threads but not data access. So regarding the 2006 programming, things will be the same.

Basically I still believe that to do application-side paging now that disks are also faster (ratio compared to RAM) is an interesting approach. I still think that using the kernel VM to do so is a bad idea in general, but could work for certain apps.

eternalban9y ago

> Basically I still believe that to do application-side paging now that disks are also faster (ratio compared to RAM) is an interesting approach. I still think that using the kernel VM to do so is a bad idea in general, but could work for certain apps.

Please elaborate. If disk/block-device performance is improving, wouldn't the VM benefit as well?

Also the last sentence seems to make more sense the other way around: VM in the general case, user-land memory management for "certain apps".

antirez9y ago

The OS VM would benefit, the problem is with using the OS VM in order to implement paging in certain applications like Redis. Does not work well because there is a tension between flexibility of in-memory representation and data locality, and OS VM needs data locality because it has no info about content and requires logically grouped data to be in near pages.

About VM in the general case: yes if for general case you mean, a random process is running and is out of memory. If we are talking about in-memory systems wanting to off-load data to disk IMHO the default is that VM does not work well.

dvirsky9y ago· 2 in thread

It is important to note that in the many years since this post, while redis has remained single-threaded - it also removed the entire concept of VM, and now works only fully in memory.

baotiaoOP9y ago

However, redis transfer these works to jemalloc. Now jemalloc control the entire VM

dvirsky9y ago

In the past you could tune redis to hold a dataset larger than the memory you had, and it would swap pages on its own. About a year after this 2010 post, antirez decided to remove this completely (in redis 2.6 or 2.8, I don't remember) and focus entirely on fully in-memory situations. VM in the redis sense used to be redis itself swapping stuff to disk with multiple threads.

Here are the redis configuration notes on VM from redis 2.2:

# Virtual Memory allows Redis to work with datasets bigger than the actual

# amount of RAM needed to hold the whole dataset in memory. # In order to do so very used keys are taken in memory while the other keys

# are swapped into a swap file, similarly to what operating systems do

# with memory pages.

....

# vm-max-memory configures the VM to use at max the specified amount of

# RAM. Everything that deos not fit will be swapped on disk if possible, that

# is, if there is still enough contiguous space in the swap file.

...

# Redis swap files is split into pages. An object can be saved using multiple

# contiguous pages, but pages can't be shared between different objects.

# So if your page is too big, small objects swapped out on disk will waste

# a lot of space. If you page is too small, there is less space in the swap

# file (assuming you configured the same number of total swap file pages).

# If you use a lot of small objects, use a page size of 64 or 32 bytes.

....

# Max number of VM I/O threads running at the same time.

# This threads are used to read/write data from/to swap file, since they

# also encode and decode objects from disk to memory or the reverse, a bigger

# number of threads can help with big objects even if they can't help with

# I/O itself as the physical device may not be able to couple with many reads/writes operations at the same time.

# The special value of 0 turn off threaded I/O and enables the blocking Virtual Memory implementation.

vm-max-threads 4

1 more reply

geerlingguy9y ago

The comments on this post are enlightening. I use both Varnish and Redis, and the architecture discussion is great!

koverstreet9y ago

One thing that would really help is if we had buffered asynchronous IO.

trevyn9y ago

(2010)

ploxiln9y ago

PHK's post, which inspired this, assumes that the process is swapping. It describes writing an page to disk to free up that page, then reading in the anonymous page of data that needs to be used for the write() system call the process uses to manually cache the data to disk. For the stuff that I use and work on, if the system is swapping anonymous pages, the situation is dire and it's time to kill (processes).

Let me back up and try to explain a bit:

While OS kernel developers have put a huge amount of effort into virtual memory management and paging, which was and is a good and necessary thing, the definition of "interactive" and "low latency" has changed. Long ago, half-second latency at a virtual terminal connected to a mainframe with hundreds or thousands of users was fantastic, compared with dropping off your stack of punch-cards and coming back 12 hours later.

For most of the software I use and work on today, I want low sub-second latency. It's often only achievable with reasonable direct control of what is in memory and what is on disk. If I click a menu in a GUI program that I haven't clicked in weeks, I don't want to wait half a second for a few scattered pages to be paged in/out of swap. Same goes for requests to web or api servers - I don't want less-common requests to take a half second longer than the typical 50ms or so. For desktop environments, GUIs, databases, caches, services: no swap.

Certainly, data, multimedia files, dictionaries, etc will need to be read from disk. The processes can arrange for separate threads to do that. We can have responsive progress bars, cancel buttons, priorities, timeouts before hitting an alternative data source - but only if the process itself is in RAM, not in swap.

Now that desktop and server systems measure DRAM in 10s of gigabytes, this really should not be hard to achieve!

I've struggled with swap and out-of-memory situations on Linux many times. The linux kernel never seems to OOM-kill processes fast enough for me. If I have no swap, then if memory pressure sets in, the kernel struggles to shrink buffers, practically freezing most processes, for a few minutes before finally killing the obvious culprit. (I've also tried memory-limiting containers, and they suffer the same problem - freeze up for a few minutes instead of immediately killing when OOM.) I used to enable plenty of swap, more than RAM, because that was the common wisdom, but it causes the same problem when the system comes under memory pressure, everything freezes for a few minutes. But it also has the additional problem that despite setting swappiness to 1 or 0, some strange services/applications will cause the kernel to put some anonymous pages in swap, even when there's plenty of free physical memory. I never want that! I need to periodically swapoff and swapon to correct it.

So, at each company I work for, I end up writing a bash script, run by cron each minute, which checks for low system memory, looks among the application services for an obvious culprit, and sends it SIGTERM. In practice, this solves the problem pretty much every time, in the most graceful way. It's extremely rare that a critical system process is the problem or looks like the problem. (Except dockerd a couple times ;)

(This is not to bash Linux in particular, Windows and MacOS use way more RAM and swap in general. I've heard the BSDs have been good at particular things at particular times, but driver support has always been more of a struggle. Besides the swap / OOM behavior, I'm pretty happy with Linux.)

Letting the OS manage disk and RAM makes perfect sense for bulk data processing - hadoop, spark, or other map-reduce or stream-processing where a few seconds pause here and there is no problem if throughput is maximized. But I personally don't work much on those things - and I'm not a rare case.

j / k navigate · click thread line to collapse

52 comments

24 comments · 8 top-level

smegel9y ago· 7 in thread

> OS paging is blocking as hell

No, Linux is rubbish. Seriously. FreeBSD does this properly.

Edit: FreeBSD, Windows, OSX, Solaris, AIX, HP-UX(?)...

doublerebel9y ago

trungaczne9y ago

Do you have any articles that talk about how FreeBSD does memory management differently?

smegel9y ago

https://people.freebsd.org/~jlemon/papers/kqueue.pdf

1 more reply

smarnach9y ago

What else can you do other than blocking until the page has been loaded? How would it be possible to resume a single-threaded process while the memory it's trying to access is not available?

smegel9y ago

Pretty simple stuff really.

1 more reply

Dylan168079y ago

Please tell me how it's possible to do something like socket.write(data), when data has been swapped out, without blocking.

smegel9y ago

1. make a non-blocking read on the data

2. while it is being paged into memory, <DO OTHER STUFF>

3 more replies

camtarn9y ago· 5 in thread

tyingq9y ago

mdasen9y ago

http://www.cs.princeton.edu/~vivek/pubs/pai_flash_99.pdf

That's a reasonably good paper on the trade-offs between event-driven, multi-threaded, and hybrid approaches to file serving.

The thread-pool approach noted in the nginx blog sounds pretty much the same as the approach in the linked paper.

nginx does have a good reputation for performance, but I think a lot of that reputation comes as a front-end for web applications rather than serving lots of hard-to-cache files.

3 more replies

kelnos9y ago

nginx is a forking server, though, so individual workers being blocked wouldn't affect others, and the application as a whole can use all available CPU cores.

1 more reply

smegel9y ago

It is subpar on Linux. On an engineered OS like FreeBSD it is amazing. There is a reason Netflix/Cloudflare/more run all their nginxs on BSD.

https://www.nginx.com/blog/thread-pools-boost-performance-9x...

baotiaoOP9y ago

So the single model work well for redis, but it doesn't work well for nginx, since if there is a request in nginx that is blocking for about 10s, people can't tolerate this situation.

antirez9y ago· 2 in thread

Hello, a few things that changed in the latest 6-7 years:

eternalban9y ago

Please elaborate. If disk/block-device performance is improving, wouldn't the VM benefit as well?

Also the last sentence seems to make more sense the other way around: VM in the general case, user-land memory management for "certain apps".

antirez9y ago

dvirsky9y ago· 2 in thread

It is important to note that in the many years since this post, while redis has remained single-threaded - it also removed the entire concept of VM, and now works only fully in memory.

baotiaoOP9y ago

However, redis transfer these works to jemalloc. Now jemalloc control the entire VM

dvirsky9y ago

Here are the redis configuration notes on VM from redis 2.2:

# Virtual Memory allows Redis to work with datasets bigger than the actual

# amount of RAM needed to hold the whole dataset in memory. # In order to do so very used keys are taken in memory while the other keys

# are swapped into a swap file, similarly to what operating systems do

# with memory pages.

....

# vm-max-memory configures the VM to use at max the specified amount of

# RAM. Everything that deos not fit will be swapped on disk if possible, that

# is, if there is still enough contiguous space in the swap file.

...

# Redis swap files is split into pages. An object can be saved using multiple

# contiguous pages, but pages can't be shared between different objects.

# So if your page is too big, small objects swapped out on disk will waste

# a lot of space. If you page is too small, there is less space in the swap

# file (assuming you configured the same number of total swap file pages).

# If you use a lot of small objects, use a page size of 64 or 32 bytes.

....

# Max number of VM I/O threads running at the same time.

# This threads are used to read/write data from/to swap file, since they

# also encode and decode objects from disk to memory or the reverse, a bigger

# number of threads can help with big objects even if they can't help with

# I/O itself as the physical device may not be able to couple with many reads/writes operations at the same time.

# The special value of 0 turn off threaded I/O and enables the blocking Virtual Memory implementation.

vm-max-threads 4

1 more reply

geerlingguy9y ago

The comments on this post are enlightening. I use both Varnish and Redis, and the architecture discussion is great!

koverstreet9y ago

One thing that would really help is if we had buffered asynchronous IO.

trevyn9y ago

(2010)

ploxiln9y ago

Let me back up and try to explain a bit:

Now that desktop and server systems measure DRAM in 10s of gigabytes, this really should not be hard to achieve!

j / k navigate · click thread line to collapse