That's funny terminolgy, isn't it? Killing a process usually means sending it a signal, typically TERM or KILL, that causes it to exit. But a zombie process is one that has already exited, but hasn't been waited for by its parent, where its parent is either the process that spawned it, or if that process has died, the process with PID 1. This is usually referred to as reaping the zombie process, not killing it. AFAIK, a signal sent to a zombie process is simply ignored.
Or do the quotes around zombie imply a different meaning, such as "zombie-like"?
The use of quotes is probably an acknowledgement that the term "zombie" is not universal. For example Linux uses "defunct" instead.
Basically, zombie processes happen when a child process exits but the parent process--the one that spawned it--doesn't reap it. [1]
Things would probably be clearer if the quotes were around "killing" rather than "zombie", mayhaps the interviewer/writer was unfamiliar with the terminology.
Why does Poettering keep claiming this when he's the one who submitted the patch that adds the PR_SET_CHILD_SUBREAPER prctl(2) [0] functionality?
PR_SET_CHILD_SUBREAPER moves the ownership of an orphaned process to whichever process was selected rather than the default PID1, and that only works for descendant of the subreaper.
The problem pointed by the quote is that normal software doesn't go around checking if it has zombie children and waiting on them, so in a container with random software S set as PID1 and creating subprocesses, zombies may accumulate until resources are exhausted[0].
PR_SET_CHILD_SUBREAPER is a way to cause that problem on a system with a proper init (or to test that your init works properly without needing to boot into it)
It's not a new observation: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb...
Previous HN discussion: https://news.ycombinator.com/item?id=8916785
[0] by default the limit is 32k processes after which the kernel will simply refuse to create new ones
If you do use PR_SET_CHILD_SUBREAPER, then you need to reap whatever gets reparented to you; if you don't do this then the process table will eventually fill up with zombies. He is correct that few programs do that, but there's nothing that requires that to be done by pid1 if all the processes within the container are spawned by something that provides that functionality and uses PR_SET_CHILD_SUBREAPER.
Docker could run a minimal pid1 in each container to address this. Though if this had been a big issue I guess this would have been already fixed.
Naturally, a proof of concept of the problem would be great. (Let's say a Dockerfile.)
It's usually fairly simple to fix (e.g. for Consul above, I raised it with the Consul guys and they said they'd look at adding waiting on children to it as a precaution - it's just a couple of lines -, but people building containers could also introduce a minimal init, or you can write your health checks to guard against it), but it happens all over the place, and people are often unaware and so not on the lookout for it and it may not be immediately obvious.
The reason I raised it as an issue for Consul, for example, even though it wasn't really their fault, but an issue with the containers, is that people need to be aware of the problem when packaging the containers, need to be aware that a given application may spawn children, and that they may not wait for them. Even a lot of people aware of the zombie issue end up packaging software that they didn't realise where spawning child processes that could end up as zombies (in this case, it took running it in a container without a proper pid 1, using health checks which not everyone will do, and writing the health checks in a particular way in order to notice the effects).
Thankfully there are a number of tiny little inits. E.g. there's suckless sinit [1], Tini[2] , and here's a tiny little proof of concept Go init [3] I wrote (though frankly, suckless or Tini compiled with musl will give you a much smaller binary) as what little you actually need to do is very trivial.
[1] http://git.suckless.org/sinit
Seems like the way Fedora is packaging systemd for 24 is going to move systemd-nspawn to a level of maturity that will likely surpass some of the clunky issues folks have with running docker.
That alone disqualifies it for me right there.
Inside of rkt there is an internal logical separation between the tool that sets up the container filesystems and the one that executes them. We call those things stages[1].
Now inside of rkt we have a few different "stage1" options today:
- systemd: this means that your container has a real init system
- clear containers: execute the container inside of a virtual machine with lkvm.[2]
- direct execution w/ fly: no init system is involved for special privileged containers.[3]
If someone wanted to contribute a stage1 that used a different init system that would be great. But, today systemd works fine and is generally an implementation detail. We also get some bonuses by using systemd on systemd systems like machinectl integration, and journald integration.
Also, I should note that rkt should work on non-systemd systems as well. Again, because, systemd is an internal detail.
[1] https://coreos.com/rkt/docs/latest/devel/architecture.html#s... [2] https://coreos.com/blog/rkt-0.8-with-new-vm-support/ [3] https://coreos.com/blog/rkt-0.15.0-introduces-rkt-fly.html
Someone has the right to say why something is "disqualified" for them, even if it is devoid of context. What is awesome here is that the leading expert for this topic is replying directly to the negative (empty) opinion and actually presents a (rich) alternate opinion.
How does you asking unanswerable questions contribute to resolving the conversation to something we can all learn from?
For philosophy reasons? Can people just not accept that systemd is the main solution that the community has accepted and move along?
There is no reason each subset or each individual even shouldn't have their own opinion and based their actions upon it.
Sometimes I wonder if systemd is actually a part of big plan of moving everyone to microservices and containers and maybe even unikernels — anything, just anything without this abomination.
Like yours: "I wonder if systemd is actually a part of big plan of moving everyone to microservices and containers and maybe even unikernels" works even better if you replace systemd with docker.
Systemd spits on isolation, it embraces integration of everything. Supervision, logging, communication, IO, configuration, state management — everything goes through systemd. Everything is binary and opaque. Docker is transparent.