Linux 3.9 introduced a new way of writing socket servers (opens in new tab)

(freeprogrammersblog.vhex.net)

289 pointsaartur12y ago126 comments

126 comments

73 comments · 21 top-level

rdtsc12y ago· 11 in thread

> Now the question is why to bother with multiprocess socket servers at all - aren't threads and events better? There's at least one good niche for them - dynamic languages like Python or Ruby, which need multiple OS processes to achieve real concurrency [my emphasis]

That is not true. It is an often repeated misconception. It makes it sound like Python creators were just incompetent and just stuck threads in there even though they are completely useless. In fact Python's threads work well for IO concurrency. I used them and saw great speedup when accepting and handling simultaneous socket connections. Yes you won't get CPU concurrency, but if your server is not CPU bound you might not notice much of a difference.

IO concurrency is real concurrency. In 8 years using Python for fun and professionally I probably wrote more IO concurrent code than CPU concurrent code. Even then for CPU concurrent code I would have had to drop into C using an extension (and there you can release the GIL anyway).

Now, the obvious follow up is that in case of IO concurrency you are often better of using gevent or eventlet. You get lighter weight threads (memory wise) and less chances of synchronizations bugs (since greenlet based green threads will switch only on IO concurrency points, socket reads, sleep and explicit waits on green semaphores and locks).

bad_user12y ago

> you won't get CPU concurrency, but if your server is not CPU bound you might not notice much of a difference

That's not true. Most applications, with the possible exceptions of proxies, are also CPU bound.

Take for example a web service that receives JSON documents. The act of parsing JSON documents is CPU bound. The act of creating a response is CPU bound. In between you can also have IO bound operations, like fetching data from a MySQL database or a Memcached instance, however in the process of creating the final response you also need to transform the data received and that's also CPU bound.

As a real world example, I worked on a web-service written in Scala and running on the JVM. Initially it was running on only 8 Heroku dynos and these instances were receiving over 30,000 requests per second of real traffic. These Heroku instances are of course under-powered, because on my modest laptop the same web server is able to handle more than 10,000 requests per second.

And yes, asynchronous I/O lets you easily have 100,000 connections per server. But if you need throughput, then the CPU starts being a bottleneck.

Of course, my problem with Python and why I migrated away from it is that in truth Python sucks for asynchronous I/O too. But that's another story.

illumen12y ago

Wrong. Those examples are IO bound - memory. You'll find the CPU waiting on memory in these cases.

JSON is parsed in C with CPython, or in assembly with PyPy.

As a real world example, I've done realtime image processing of a gigabyte per second worth of data on a single machine with asynchronous python. It was IO bound, we had more CPU to spare. Hell, we even had some GPUs sitting there not doing anything because they weren't needed.

If you're doing real performance computing, then taking advantage of GPUs/DSP or other hardware is where it is at anyway. Python is quite good at a glue language for interfacing to these things.

3 more replies

sluukkonen12y ago

What he means is true parallelism (although alternate ruby and python implementations have it).

rdtsc12y ago

Downloading 2 web pages at the same time without one blocking another from completing is true parallelism. The request is sent for one, while it is in progress (maybe server is slow), another one can go out and come back with data. This can happen for hundreds or thousands of them. These are executed in parallel. So we got concurrent units of work executed at the same time, I fail to see how that is not parallel.

Now this is IO concurrency but it is real concurrency. Adding CPU concurrency would be very nice. It might speed things up a bit, or it might not. It really depends.

As an example consider haproxy. The little proxy that could. It handles large amounts of concurrent connection in parallel and it is single threaded in its default configuration. I've heard of 100k connections. It deals with IO concurrency. Chances are, making it multi-threaded might not dramatically improve its performance (it might even slow it down).

2 more replies

RivieraKid12y ago

By real concurrency he clearly meant CPU concurrency.

rdtsc12y ago

I got that, but pretended to not understand in order to illustrate a point.

That is a common misconception. And it seems to me nowadays most concurrency people deal with (at least when it comes to server and web back-end world) is heavier IO bound. Yet everyone automatically default to their CS 102 -- algorithms class when they think about solving graph problems in parallel or multiplying matrices. So concurrency automatically is implied to be CPU concurrency.

nshepperd12y ago

I came in here expecting to see a pedantic debate about the use of the word "concurrency", and I was not disappointed. Though of course you have a point.

1 more reply

anaphor12y ago

Thanks, people need to learn the difference between concurrency and parallelism ffs, I'm tired of explaining the difference all the time.

RivieraKid12y ago

It doesn't really matter how you call it. There are more important things people need learn about.

1 more reply

smegel12y ago

> In fact Python's threads work well for IO concurrency.

So long as you don't have any CPU bound threads competing for the GIL ;)

rbanffy12y ago

That's why profiling your code is so important.

If you have a CPU bound thread, it may be worth to pay the performance penalty of separating some of the program flow in different processes.

joosters12y ago· 6 in thread

You never needed to prefork. One process can open a listening socket and share it with an unrelated process via file-descriptor passing.

rdtsc12y ago

That is a cool trick! Does that work via pipes and maybe also unix sockets? I suspect latency might still be slightly better with a pool of pre-forked processes/threads.

bdarnell12y ago

It uses unix sockets. Latency is the same as the standard pre-forking model; the only difference is that file-descriptor passing lets you manage the worker processes independently instead of requiring them to have a common parent process (this is important when rolling out new code to a service with a lot of active connections, since it's disruptive to restart all the workers at once).

Here's a demo in Python: https://gist.github.com/bdarnell/1073945

1 more reply

joosters12y ago

Yes, you can do it with unix sockets. Not sure about the latency, but you can pass the listen sockets in advance, rather than having one process accept()ing incoming connections and then passing those to other processes to handle. So unless you're binding to new ports all the time, it's all just a little extra startup work and won't impact the performance of the server.

bratsche12y ago

Yes, you can do it with unix sockets.

wmf12y ago

That was covered in the LWN article: "The first of the traditional approaches is to have a single listener thread that accepts all incoming connections and then passes these off to other threads for processing. The problem with this approach is that the listening thread can become a bottleneck in extreme cases." But since most people are dealing with non-extreme cases, they should be aware of fd passing.

joosters12y ago

That sounds like the 'standard' model of several threads calling accept(), after all if you are multi-threaded you don't need to pass a file descriptor around, each thread will already have it.

With FD passing, you can have multiple processes, related or unrelated, pulling incoming connections from the same socket. You use the FD passing to share the listening socket.

haberman12y ago· 6 in thread

Interesting. It seems like one potential hazard is that bonafide port conflicts are not detected. If SO_REUSEPORT is preferred for performance reasons, and most/all servers are using it, then starting up a server that uses the same port as an existing service becomes a silent error.

It could even work as expected for a while (since the kernel gets to arbitrarily decide what port to deliver incoming requests to) only to intermittently fail later.

geocar12y ago

I can't imagine people will start using SO_REUSEPORT by default, since the "performance reasons" are a happy accident of having a hint (that the process wants wakeups distributed across all CPUs). I'd rather get that hint in another way- perhaps by sharing an epollfd with multiple processes.

I would however like SO_REUSEPORT to run experiments: Right now we use iptables/tc to direct some traffic at "new versions" of some of our systems so we can run tests with live data, but connection tracking for localhost is lame. I'd much rather use SO_REUSEPORT.

ikeepforgetting12y ago

Listening to the same port requires processes with the same uid.

rbanffy12y ago

> then starting up a server that uses the same port as an existing service becomes a silent error.

Only if it has the same uid as the other one. It'd also be trivial to check whether the other processes listening to your port are "friendly" (as in "you don't want both Apache and Nginx listening on port 80").

cbr12y ago

Having Apache and Nginx both listening on port 80 might be an interesting way to A/B test them.

MertsA12y ago

The second process would have to be run under the same user for that to happen though so a real production system would probably never be impacted by this but what would be nice is a flag to restrict the port to children of a particular PID or just lock it to one particular PPID.

wicknicks12y ago

I imagine production servers would run monitoring processes to remove such "listening bugs".

gargoiler0012y ago· 6 in thread

why would anyone still be using threads or processes these days? :/ hardly scalable or efficient.

jerf12y ago

You know, this [1] really ought to permanently put away the idea I think you're trying to reference, which is that only "event based" systems can be performant. There's plenty of "thread" or "process" based approaches that do quite well, including I believe the uppermost tier of every benchmark on that site. The idea that threads or processes are intrinsically slow was sheer unmitigated propaganda, and probably not only failed to contain a grain of truth, but are actively false. (Some thread implementations were slower than others, but that turns out to have been the implementations rather than the idea.) Event based systems inevitably have a lot of function calls in them, and that will probably in the end be slower than properly done threads or continuation-based approaches, always, because of that overhead.

[1]: http://www.techempower.com/benchmarks/

rdtsc12y ago

People measure different things different ways and then draw conclusion (or tweak measurement parameters until it supports their already pre-conceived belief).

Event based system can be more performant in some cases and slow in another cases. If there is not much opportunity for CPU to do any work, then event based system will often outperform threads. One example is proxies. I already gave haproxy as an example, so I'll repeat it here as well. It is single threaded event based by default. It is certainly performant. Why? Because in a simplified model it just shuffles data from one socket to another. Pretty straight forward. Introducing multiple threads and context switches might just thrash caches around and actually make it worse (I have seen that happen).

Now add some CPU work in there. Say make each connection compute something, serialize some JSON. Like in those benchmarks, they use a DB driver get a row, serialize it and return. Ok there is some work. Now it is more likely that multi-threaded will help. But again one can surely tweak CPU affinities, thread pool sizes, hyper-threading BIOS settings, db driver types to really change things up. Threads take up memory. Not an insignificant amount. Now I like green threads, Erlang's processes, Go's goroutines because they are lightweight. (At least Erlang's processes map N:M to CPUs for parallel execution on the host machine).

So I guess my point is you are right that event based are not always and strictly more performant. But I also think in certain cases it can beat multi-threaded code (thread memory size, context switches, cache thrashing). That benchmark there, I wouldn't take it too seriously just like I wouldn't take Language Shootout too seriously.

2 more replies

gargoiler0012y ago

> The idea that threads or processes are intrinsically slow was sheer unmitigated propaganda, and probably not only failed to contain a grain of truth, but are actively false.

Threads / processes:

  * Run some code from A
  * Save state, context switch
  * Run some code from B
  * Save state, context switch

  * Deal with locking, synchronisation, etc

  * Run some code.

There is absolutely no instances where [num threads] > [num cores] is as efficient as not using more threads than cores.

1 more reply

lttlrck12y ago

To take advantage of multiple cores?

gargoiler0012y ago

Yeah because I have as many cores as I have concurrent HTTP requests, and obviously it's CPU bound...

1 more reply

colanderman12y ago

To hide I/O latency. You cannot do this effectively without threads without implementing your own scheduler, unless your I/O delays are constant and known a priori.

cperciva12y ago· 3 in thread

For what it's worth, BSD has had SO_REUSEPORT since BSD 4.4-Lite.

nemetroid12y ago

For anyone curious: released in 1994.

sounds12y ago

More information from 2010 about the way to do that in Linux: http://stackoverflow.com/questions/3261965/so-reuseport-on-l...

This is an example of a major downfall with free software: a developer decides he needs a feature so he implements it without taking any effort to see what has been done before – and more importantly, why.

It leads to the project sprouting thousands of new features while nothing achieves the polish and completeness of the original idea because the developer moved on to something newer and shinier.

I can't find the original blog post where I read the idea, but I did find one on Coding Horror: http://www.codinghorror.com/blog/2008/01/the-magpie-develope...

The Linux kernel solves this by having Linus, who has the long term perspective and the commitment to keep the project moving forward. I'm not claiming he's perfect, just that having him is the correct solution to the problem. Obviously here is someone who thinks the 3.9 kernel has a new feature he needs all the while ignoring past socket work.

1 more reply

masklinn12y ago

And OSX has inherited it from BSD.

jkn12y ago· 3 in thread

Am I right that this makes it trivial to deploy a new version of my server with zero downtime? I can just start the new server to handle new connections and tell the old one to stop accepting connections and quit when existing requests are completed, no need for another layer routing?

DonPellegrino12y ago

That's exactly how I do it for my Node programs. Any service that I want to have 100% is using the cluster module, resulting in multiple processes listening to the same port. When I want to update, I replace the files and kill the processes one by one.

caf12y ago

You could already do this by having a way for the new version to connect an AF_UNIX socket to the old version and request that the listening file descriptor be passed from old to new.

pfraze12y ago

That seems correct to me.

audidude12y ago· 3 in thread

This could be useful for periodic tracing/profiling as well. Simply have a second instance with all debugging symbols and tracing enabled, but only accept() a client every X seconds.

hosay12312y ago

Sadly it doesn't work like that.. if 2 processes have the same port number bound, then approximately 50% of clients will hash onto the second receive queue. If the debug process only accepts a few connections every so often, then nearly 50% of traffic will essentially be dropped on the floor

It's also not possible to occasionally listen and unlisten.. that causes the hash modulus to change, sending traffic to the wrong sockets and (most likely) resetting all existing connections

pritambaral12y ago

The hash modulus reset issue is being worked on. Source: the original lwn posting.

lttlrck12y ago

That's a really neat idea. Thanks, it could be useful.

Refefer12y ago· 3 in thread

I'm a bit more worried about the security aspect of it.

Let's say that we are running a server on a port which uses this option to allow multiple processes to bind to it. What's to prevent a rogue process, perhaps with malicious intent, from starting up and siphoning off requests willy nilly? Sounds like a great way to implement a hard to detect MITM attack.

What would be nicer, I think, is if socket reusing was bound not only to the same uid but also to the process listening to it.

takeda6412y ago

As I understand you need to have the same EUID to be able to bind to the same port.

subim12y ago

That's right. This article doesn't mention it, but the LWN article it cited (https://lwn.net/Articles/542629/) does.

pfraze12y ago

You can mitigate that risk by using one of the first 1024 ports, since they require root access.

bborud12y ago· 3 in thread

Why does the blog posting only mention fork and prefork as options? A very common way to design servers is to do multiplexing IO. The one-connection-per-thread/process isn't the only way.

That being said, this option can simplify things -- removing the necessity of having some moving part to distribute connections across completely independent processes.

jerf12y ago

"Why does the blog posting only mention fork and prefork as options?"

Because this is a Linux kernel feature involving sharing a socket amongst multiple OS processes, and is therefore only interesting to talk about if you are using multiple OS processes. It's not a generalized primer on all techniques of handling IO.

joosters12y ago

They aren't mutually exclusive. You can have multiple processes performing non-blocking I/O, as a way of scaling over several cores without multithreading.

bborud12y ago

Exactly.

zzzcpan12y ago· 3 in thread

Meh. SO_REUSEPORT doesn't change the way socket servers are written. I was expecting something, like syscall batching for sockets, but not this.

rdtsc12y ago

They way I understood it, it does change because it simplifies the server. There is no need for one top level listening process/thread. Each separate process/thread can open the listening socket independently.

(Btw, there is another interesting forking-for-client-connection pattern in Erlang. Instead of forking off and handling the client connection in a separate process, instead handle the client connection in the accepting process but fork-off another process to continue accepting. In general, just a process pool, that should be easier to set up with this new feature).

pfraze12y ago

> There is no need for one top level listening process/thread

And as a result, the user can configure the prefetch pool.

1 more reply

RivieraKid12y ago

Pretty much, it's just some small unimportant technical detail.

MalcolmEvershed12y ago· 2 in thread

It seems like this could help solve the thundering herd problem [0][1][2], no?

[0] http://en.wikipedia.org/wiki/Thundering_herd_problem [1] http://stackoverflow.com/questions/15636319/why-is-accept-mu... [2] http://uwsgi-docs.readthedocs.org/en/latest/articles/Seriali...

wmf12y ago

Problem was already solved: "In modern times, the vast majority of UNIX systems have evolved, and now the kernel ensures (more or less) only one process/thread is woken up on a connection event."

MalcolmEvershed12y ago

I believe that quote from [2] is referring to simply calling accept(), but modern socket servers use epoll() (or similar) before accept() which I think still has the problem (because I've run strace on nginx and uwsgi and I'm pretty sure I saw all processes wake-up from epoll()). So I'm thinking that with SO_REUSEPORT, each server process would have a different socket to epoll() on, and the kernel would only wake-up one process on a new connection, thus, solving the thundering herd problem for modern servers.

fooyc12y ago· 2 in thread

This is likely to consume more memory, because of copy on write pages (or lack of thereof).

Implementing the prefork model by spawning unrelated processes (by opposition to forking from a common parent process) is likely to consume more memory: each process is unrelated, and do not share copy on write memory pages with other processes.

nullc12y ago

Shared library code already gets shared, so this may not be as bad as you think.

justincormack12y ago

You could use SO_REUSEPORT with threads too. Linux threads are processes after all, with tweaked clone() options.

IgorPartola12y ago· 1 in thread

So this is inereating, except in the real world your parent process does more than the article implies. The big thing it is in charge of (and the thing that I have seen many of them get wrong) is (a) keeping the child processes running/restating them when they fail and (2) performing graceful config or code reload. The OS has no business doing the latter and would have a very hard time doing the former.

In fact I have seen issues where gunicorn failed miserably simply because it did not handle a bad import in a child process. Tornado as of the latest version I had used (2.0 I think) did not have any ability to check for dead child processes. I am sure there are more examples of this done wrong than right.

This is an interesting option for several use cases but you still need a parent process to monitor things. Perhaps at some point upstart or systemd will get good enough to monitor multiple processes per daemon in real time. Until then, meh.

Edit: actually, one cool thing you can do with this is code reloading. You simply have your parent process start more workers that attach to the same socket, then kill the old ones. That way the idea of code or config reloading doesn't need to be baked into every part of the worker.

sorbits12y ago

> in the real world your parent process does more than the article implies […] keeping the child processes running/restating […] performing graceful config or code reload […]

The article suggests you let http://supervisord.org/ (or similar) take care of these things.

rgarcia12y ago

This seems very relevant to people using Node, considering it has basically standardized around the "pre-fork" [0] model as a way to use more than one core. It'll be interesting to see where this goes.

[0] http://nodejs.org/api/cluster.html

robbles12y ago

One detail that doesn't seem to be mentioned here or in the linked article is how the multiplexing of sockets is actually handled at the kernel level.

Does the kernel use some sort of round-robin approach to assigning client sockets to processes waiting on accept()? This is one area where I'd imagine a dedicated master process would be beneficial, as it could implement "smarter" load balancing based on the health and response times of its child processes.

gwu7812y ago

This is the -T option in W.R. Stevens' sock utility.

See Appendix C to his December 15, 1993 book on TCP/IP.

1993.

Amadou12y ago

Is SO_REUSEPORT really all that much better than a server process that hands off incoming connections to other independent processes via an AF_UNIX socket with sendmsg/recvmsg?

If I understand SO_REUSEPORT right you let the kernel decide everything - access control, receiving process, timing, etc in exchange for not having your own process doing the same thing. Since that simplistic approach is the kind of thing that can be implemented in about 100 lines of user-space code doing file-descriptor sharing with sendmsg/recvmg via AF_UNIX sockets, I don't see the benefit of pushing that complexity into the kernel. Especially since if you want to exercise any greater level of control you'll just have to roll your own AF_UNIX based code anyway.

fexl12y ago

"in the fork model a number of processes can grow uncontrollably."

You can use setrlimit to prevent that. Plus, your application is likely to have direct control over forking anyway.

buster12y ago

Cool, i didn't know that would be possible with sockets, sounds like a nice option. Although i wonder how efficient it is, but it may be worthwhile to spawn [number of cores] x [node.js | python | ruby] servers which themselves only run asynchronous functions, greenlets, etc. in a single thread..

mrottenkolber12y ago

The same model I use in my soon to be released web server. :) Have a thread pool compete for an accept-lock. Performance isn't that bad actually. About the same as thttpd.

halayli12y ago

nginx already scales by spawning multiple processes. The worker processes share the listening file descriptors from the parent master process which allows the workers to accept connections on the listening fds.

j / k navigate · click thread line to collapse

126 comments

73 comments · 21 top-level

rdtsc12y ago· 11 in thread

bad_user12y ago

> you won't get CPU concurrency, but if your server is not CPU bound you might not notice much of a difference

That's not true. Most applications, with the possible exceptions of proxies, are also CPU bound.

And yes, asynchronous I/O lets you easily have 100,000 connections per server. But if you need throughput, then the CPU starts being a bottleneck.

Of course, my problem with Python and why I migrated away from it is that in truth Python sucks for asynchronous I/O too. But that's another story.

illumen12y ago

Wrong. Those examples are IO bound - memory. You'll find the CPU waiting on memory in these cases.

JSON is parsed in C with CPython, or in assembly with PyPy.

If you're doing real performance computing, then taking advantage of GPUs/DSP or other hardware is where it is at anyway. Python is quite good at a glue language for interfacing to these things.

3 more replies

sluukkonen12y ago

What he means is true parallelism (although alternate ruby and python implementations have it).

rdtsc12y ago

Now this is IO concurrency but it is real concurrency. Adding CPU concurrency would be very nice. It might speed things up a bit, or it might not. It really depends.

2 more replies

RivieraKid12y ago

By real concurrency he clearly meant CPU concurrency.

rdtsc12y ago

I got that, but pretended to not understand in order to illustrate a point.

nshepperd12y ago

I came in here expecting to see a pedantic debate about the use of the word "concurrency", and I was not disappointed. Though of course you have a point.

1 more reply

anaphor12y ago

Thanks, people need to learn the difference between concurrency and parallelism ffs, I'm tired of explaining the difference all the time.

RivieraKid12y ago

It doesn't really matter how you call it. There are more important things people need learn about.

1 more reply

smegel12y ago

> In fact Python's threads work well for IO concurrency.

So long as you don't have any CPU bound threads competing for the GIL ;)

rbanffy12y ago

That's why profiling your code is so important.

If you have a CPU bound thread, it may be worth to pay the performance penalty of separating some of the program flow in different processes.

joosters12y ago· 6 in thread

You never needed to prefork. One process can open a listening socket and share it with an unrelated process via file-descriptor passing.

rdtsc12y ago

That is a cool trick! Does that work via pipes and maybe also unix sockets? I suspect latency might still be slightly better with a pool of pre-forked processes/threads.

bdarnell12y ago

Here's a demo in Python: https://gist.github.com/bdarnell/1073945

1 more reply

joosters12y ago

bratsche12y ago

Yes, you can do it with unix sockets.

wmf12y ago

joosters12y ago

That sounds like the 'standard' model of several threads calling accept(), after all if you are multi-threaded you don't need to pass a file descriptor around, each thread will already have it.

With FD passing, you can have multiple processes, related or unrelated, pulling incoming connections from the same socket. You use the FD passing to share the listening socket.

haberman12y ago· 6 in thread

It could even work as expected for a while (since the kernel gets to arbitrarily decide what port to deliver incoming requests to) only to intermittently fail later.

geocar12y ago

ikeepforgetting12y ago

Listening to the same port requires processes with the same uid.

rbanffy12y ago

> then starting up a server that uses the same port as an existing service becomes a silent error.

cbr12y ago

Having Apache and Nginx both listening on port 80 might be an interesting way to A/B test them.

MertsA12y ago

wicknicks12y ago

I imagine production servers would run monitoring processes to remove such "listening bugs".

gargoiler0012y ago· 6 in thread

why would anyone still be using threads or processes these days? :/ hardly scalable or efficient.

jerf12y ago

[1]: http://www.techempower.com/benchmarks/

rdtsc12y ago

People measure different things different ways and then draw conclusion (or tweak measurement parameters until it supports their already pre-conceived belief).

2 more replies

gargoiler0012y ago

> The idea that threads or processes are intrinsically slow was sheer unmitigated propaganda, and probably not only failed to contain a grain of truth, but are actively false.

Threads / processes:

  * Run some code from A
  * Save state, context switch
  * Run some code from B
  * Save state, context switch

  * Deal with locking, synchronisation, etc

  * Run some code.

There is absolutely no instances where [num threads] > [num cores] is as efficient as not using more threads than cores.

1 more reply

lttlrck12y ago

To take advantage of multiple cores?

gargoiler0012y ago

Yeah because I have as many cores as I have concurrent HTTP requests, and obviously it's CPU bound...

1 more reply

colanderman12y ago

To hide I/O latency. You cannot do this effectively without threads without implementing your own scheduler, unless your I/O delays are constant and known a priori.

cperciva12y ago· 3 in thread

For what it's worth, BSD has had SO_REUSEPORT since BSD 4.4-Lite.

nemetroid12y ago

For anyone curious: released in 1994.

sounds12y ago

More information from 2010 about the way to do that in Linux: http://stackoverflow.com/questions/3261965/so-reuseport-on-l...

It leads to the project sprouting thousands of new features while nothing achieves the polish and completeness of the original idea because the developer moved on to something newer and shinier.

I can't find the original blog post where I read the idea, but I did find one on Coding Horror: http://www.codinghorror.com/blog/2008/01/the-magpie-develope...

1 more reply

masklinn12y ago

And OSX has inherited it from BSD.

jkn12y ago· 3 in thread

DonPellegrino12y ago

caf12y ago

You could already do this by having a way for the new version to connect an AF_UNIX socket to the old version and request that the listening file descriptor be passed from old to new.

pfraze12y ago

That seems correct to me.

audidude12y ago· 3 in thread

This could be useful for periodic tracing/profiling as well. Simply have a second instance with all debugging symbols and tracing enabled, but only accept() a client every X seconds.

hosay12312y ago

It's also not possible to occasionally listen and unlisten.. that causes the hash modulus to change, sending traffic to the wrong sockets and (most likely) resetting all existing connections

pritambaral12y ago

The hash modulus reset issue is being worked on. Source: the original lwn posting.

lttlrck12y ago

That's a really neat idea. Thanks, it could be useful.

Refefer12y ago· 3 in thread

I'm a bit more worried about the security aspect of it.

What would be nicer, I think, is if socket reusing was bound not only to the same uid but also to the process listening to it.

takeda6412y ago

As I understand you need to have the same EUID to be able to bind to the same port.

subim12y ago

That's right. This article doesn't mention it, but the LWN article it cited (https://lwn.net/Articles/542629/) does.

pfraze12y ago

You can mitigate that risk by using one of the first 1024 ports, since they require root access.

bborud12y ago· 3 in thread

Why does the blog posting only mention fork and prefork as options? A very common way to design servers is to do multiplexing IO. The one-connection-per-thread/process isn't the only way.

That being said, this option can simplify things -- removing the necessity of having some moving part to distribute connections across completely independent processes.

jerf12y ago

"Why does the blog posting only mention fork and prefork as options?"

joosters12y ago

They aren't mutually exclusive. You can have multiple processes performing non-blocking I/O, as a way of scaling over several cores without multithreading.

bborud12y ago

Exactly.

zzzcpan12y ago· 3 in thread

Meh. SO_REUSEPORT doesn't change the way socket servers are written. I was expecting something, like syscall batching for sockets, but not this.

rdtsc12y ago

pfraze12y ago

> There is no need for one top level listening process/thread

And as a result, the user can configure the prefetch pool.

1 more reply

RivieraKid12y ago

Pretty much, it's just some small unimportant technical detail.

MalcolmEvershed12y ago· 2 in thread

It seems like this could help solve the thundering herd problem [0][1][2], no?

[0] http://en.wikipedia.org/wiki/Thundering_herd_problem [1] http://stackoverflow.com/questions/15636319/why-is-accept-mu... [2] http://uwsgi-docs.readthedocs.org/en/latest/articles/Seriali...

wmf12y ago

Problem was already solved: "In modern times, the vast majority of UNIX systems have evolved, and now the kernel ensures (more or less) only one process/thread is woken up on a connection event."

MalcolmEvershed12y ago

fooyc12y ago· 2 in thread

This is likely to consume more memory, because of copy on write pages (or lack of thereof).

nullc12y ago

Shared library code already gets shared, so this may not be as bad as you think.

justincormack12y ago

You could use SO_REUSEPORT with threads too. Linux threads are processes after all, with tweaked clone() options.

IgorPartola12y ago· 1 in thread

sorbits12y ago

> in the real world your parent process does more than the article implies […] keeping the child processes running/restating […] performing graceful config or code reload […]

The article suggests you let http://supervisord.org/ (or similar) take care of these things.

rgarcia12y ago

[0] http://nodejs.org/api/cluster.html

robbles12y ago

One detail that doesn't seem to be mentioned here or in the linked article is how the multiplexing of sockets is actually handled at the kernel level.

gwu7812y ago

This is the -T option in W.R. Stevens' sock utility.

See Appendix C to his December 15, 1993 book on TCP/IP.

1993.

Amadou12y ago

Is SO_REUSEPORT really all that much better than a server process that hands off incoming connections to other independent processes via an AF_UNIX socket with sendmsg/recvmsg?

fexl12y ago

"in the fork model a number of processes can grow uncontrollably."

You can use setrlimit to prevent that. Plus, your application is likely to have direct control over forking anyway.

buster12y ago

mrottenkolber12y ago

The same model I use in my soon to be released web server. :) Have a thread pool compete for an accept-lock. Performance isn't that bad actually. About the same as thttpd.

halayli12y ago

j / k navigate · click thread line to collapse