Introducing dumb-init, an init system for Docker containers (opens in new tab)

(engineeringblog.yelp.com)

161 pointsckuehl10y ago37 comments

37 comments

28 comments · 11 top-level

ivansavz10y ago· 5 in thread

Is there any reason you wouldn't run normal "non-dumb" init for this using

    CMD ["/sbin/init", "2"]

and start your app using init.d scripts or supervidord as usual?

I feel like logrotate, cron, etc, are worth having inside container no?

jarito10y ago

Generally accepted practice is no - they aren't worth having. Containers should only contain a single process. That process shouldn't be writing logs to disk (hence no logrotate) and timed tasks would generally be done outside the container rather than in it (though there are a lot of ways to skin that cat).

Single process containers generally don't need all the baggage of a full init system or other dependencies - hence this project.

rvense10y ago

At my current job, we're basically using Docker as a sort of package manager and deployment script runner. Our containers are very fat, things are installed with apt. One of them has GCC in it but I'm not sure why. One installs Node and runs a few js scripts during the build process, then never runs it again but keeps it around. It's obviously wrong, but I think it's just a new set of bad ideas that this software has allowed people to have.

2 more replies

justinsaccount10y ago

> I feel like logrotate, cron, etc, are worth having inside container no?

It depends. If you are treating the container as a mini virtual system, then sure.

There is no need for logrotate if you are writing logs to stdout and letting the container system handle shipping then somewhere.

There's no need for cron in the container if you have a separate system for running scheduled tasks - somehing like Chronos.

lotyrin10y ago

Part of the potential gain with containers is that you don't have to treat them like full os environments, and don't have to worry about administering them as such.

I'd rather have a few tens of simple, single-process containers logging to a shared collector (or to their stdout, which is then collected) than deal with managing logrotate for them all and solve processing all those files for every host somehow.

alexchamberlain10y ago

Weight, right? All of those things take resources to run, and if you're running 1000s of them, it adds up.

gnud10y ago· 3 in thread

The actual reason you really need a proper PID 1 is not explained in this post, but a couple of clicks away at [0]:

  >[...] the init process must also wait for child processes to terminate, 
  >before terminating itself.
  >If the init process terminates prematurely then all children are terminated uncleanly by the kernel.

0: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb...

vishvananda10y ago

It also needs to reap orphan processes or they will become zombies. The dumb-init code does not appear to be doing that so I reported an issue[1].

In general docker is half-trying to be the init system, but most people using it are putting a whole child os with its own init system in their container. I think the approach that rkt uses where it uses systemd to run the process is safer. Now if people would just start using lightweight containers...

[1]: https://github.com/Yelp/dumb-init/issues/44

jzelinskie10y ago

Considering phusion/baseimage has been around for more than 2 years and plenty of people have been using an init system inside their containers that contain multiple process, why didn't Yelp just pick something up off the shelf? Why not use runit or one of the plenty of more mature lightweight init systems?

predakanga10y ago

I can't speak to the others that have been mentioned in this thread (tini in particular seems to be identical), but the solution used by phusion/baseimage is written in python[0] - a C-based solution allows for lighter-weight containers

[0]: https://github.com/phusion/baseimage-docker/blob/master/imag...

lox10y ago· 3 in thread

If it's such a straightforward fix, why isn't it part of the docker core? I'd love to hear from the docker team why it's not a concern for them. Presumably if it was they'd have addressed it by now.

From my own experience with docker in production, I'm yet to see any of the described scenarios crop up. Has anyone else, or is this solving an extreme edge case?

ckuehlOP10y ago

> From my own experience with docker in production, I'm yet to see any of the described scenarios crop up. Has anyone else, or is this solving an extreme edge case?

The biggest issue we see at Yelp is leaking containers in test (e.g. Jenkins aborting a job but leaving the containers it spawned still running).

Depending on how you orchestrate containers, you might not encounter the issue in prod. If you're using something like Kubernates or Marathon or Paasta, they're probably going to do the "right thing" and ensure the containers are actually stopped.

We also use containers a lot in development. For example, we might put a single tool into a container, and then when developers call that tool, they're actually spawning a container without realizing it. For this use case, it's really important that signals are handled properly so that Ctrl-C (and similar) continues working.

sandGorgon10y ago

Why did you not use something like supervisord? I run a few containers (obviously not at yelp scale) and supervisors has been spectacular at restarting, managing,reloading,etc. It handles nginx,gunicorn,puma,tomcat, etc pretty well. Yes its python - but was that the motivation?

Also,you guys should comment on https://github.com/docker/docker/pull/5773 which is work on unprivileged systemd in docker. I think you guys can influence the bug with your experience in this.

3 more replies

zenlikethat10y ago

Zombies are not uncommon to see for an app running in a container that forks off other processes. This isn't every program but forking processes is pretty common, enough so to be worrisome.

For instance, some programs watch the Docker event stream and can reload, say, HAproxy configuration to automatically load balance any new containers which come up. In my experience, running such a program in a container can make reloading the HAproxy process frequently tend to create a huge variety of zombie processes - and once they're present, zombies are difficult to eliminate without a reboot.

vikiomega910y ago· 2 in thread

> Having a shell as PID 1 actually makes signaling your process almost impossible. Signals sent to the shell won’t be forwarded to the subprocess, and the shell won’t exit until your process does. The only way to kill your container is by sending it SIGKILL (or if your process happens to die).

Noob question. Why is it impossible? You have the PID, no?

ckuehlOP10y ago

Good question! The problem is trying to signal it from outside the Docker container.

If your container has a process tree like

    PID 1: /bin/sh
    +--- PID 2: <your Python server>

then if you use `docker signal` from the host, it will only send a signal to PID 1, which is the shell. However the shell won't forward it on to your Python server, so nothing happens (in most cases).

dumb-init basically replaces the shell in that diagram, but forwards signals when it receives them. So when you use `docker signal`, the Python process receives the signal.

Alternatively, just eliminating the shell (so your Python app is PID 1) works for some cases, but you get special kernel behavior applied to PID 1 which you usually don't want. This is the main purpose of dumb-init.

vikiomega910y ago

Ah that makes sense. I did not realize how docker-signal forwarded signals. From its perspective using PID 1 makes sense because that's where the "application" should run as specified in your dockerfile.

lox10y ago· 2 in thread

This looks to be an alternative https://github.com/krallin/tini

ckuehlOP10y ago

Yup, tini is really really similar and looks pretty cool! They're solving much of the same problem. It's unfortunate that we didn't find tini before we went and wrote dumb-init.

There are some minor differences (dumb-init looks like it's probably a bit better for interactive commands since it e.g. handles SIGTSTP). You can also get process group behavior at run-time with dumb-init rather than compile time, and it's on by default unlike tini (as far as I can tell from a brief reading). But for most cases it won't make a difference.

krallin10y ago

Quick disclaimer: I'm the author of Tini (thanks for the hat tip, by the way!).

Note that for interactive usage, Tini actually hands over the tty (if there is one) to the child, so in that case signals that come "from the TTY" (though in a Docker environment this is an over-simplication) actually bypass Tini and are sent to the child directly. This should include SIGSTP, though I'm not sure I tested this specifically.

That being said, both tools are probably indeed very similar — after all there is little flexibility in that kind of tool! Process group behavior is probably indeed where they differ the most. : )

stephen-mw10y ago· 1 in thread

Very nice! I see this as the next evolution to the phusion's custom init system[0], which was created to solve largely the same problems.

I should be able to take Yelp's dumb-init and add it easily to any linux container I want -- including things such as Alpine[1]

[0] https://github.com/phusion/baseimage-docker/blob/master/imag... [1] https://github.com/gliderlabs/docker-alpine

Gigablah10y ago

s6 and s6-overlay were also created with the same goal: https://github.com/just-containers/s6-overlay

I use it in my Alpine containers.

dpedu10y ago· 1 in thread

This is cool, but you can solve the same problem with a single line of bash

    trap 'kill $(jobs -p)' EXIT

FooBarWidget10y ago

No you can't. It looks like it can, but there are various edge cases that aren't handled.

See https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb..., section "A simple init system".

akavel10y ago

If I understand correctly, the main goal here can be summarized in the quote below:

"The motivation: modeling Docker containers as regular processes

[...] we want processes to behave just as if they weren’t running inside a container. That means handling user input, responding the same way to signals, and dying when we expect them to. In particular, when we signal the docker run command, we want that same signal to be received by the process inside."

and that seems to me as the core reason why they can't just use a simple init system (like e.g. runit I suppose?)

DanielDent10y ago

Another alternative, for what it's worth: https://github.com/rciorba/pidunu

I've created an Ubuntu PPA packaging of it (https://launchpad.net/~danieldent/+archive/ubuntu/pidunu) and you can see an example of it in use at: https://github.com/DanielDent/docker-powerdns

For situations involving multiple processes, there's also https://github.com/just-containers/s6-overlay

Example use: https://github.com/DanielDent/docker-nginx-ssl-proxy (automated Let's Encrypt SSL front-end)

ktt10y ago

Reminds me of RancherOS that also have Docker running as PID 1: https://github.com/rancher/os#how-this-works

anonbanker10y ago

Another sign that systemd is not going to infect docker any time soon.

j / k navigate · click thread line to collapse

37 comments

28 comments · 11 top-level

ivansavz10y ago· 5 in thread

Is there any reason you wouldn't run normal "non-dumb" init for this using

    CMD ["/sbin/init", "2"]

and start your app using init.d scripts or supervidord as usual?

I feel like logrotate, cron, etc, are worth having inside container no?

jarito10y ago

Single process containers generally don't need all the baggage of a full init system or other dependencies - hence this project.

rvense10y ago

2 more replies

justinsaccount10y ago

> I feel like logrotate, cron, etc, are worth having inside container no?

It depends. If you are treating the container as a mini virtual system, then sure.

There is no need for logrotate if you are writing logs to stdout and letting the container system handle shipping then somewhere.

There's no need for cron in the container if you have a separate system for running scheduled tasks - somehing like Chronos.

lotyrin10y ago

Part of the potential gain with containers is that you don't have to treat them like full os environments, and don't have to worry about administering them as such.

alexchamberlain10y ago

Weight, right? All of those things take resources to run, and if you're running 1000s of them, it adds up.

gnud10y ago· 3 in thread

The actual reason you really need a proper PID 1 is not explained in this post, but a couple of clicks away at [0]:

  >[...] the init process must also wait for child processes to terminate, 
  >before terminating itself.
  >If the init process terminates prematurely then all children are terminated uncleanly by the kernel.

0: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb...

vishvananda10y ago

It also needs to reap orphan processes or they will become zombies. The dumb-init code does not appear to be doing that so I reported an issue[1].

[1]: https://github.com/Yelp/dumb-init/issues/44

jzelinskie10y ago

predakanga10y ago

[0]: https://github.com/phusion/baseimage-docker/blob/master/imag...

lox10y ago· 3 in thread

If it's such a straightforward fix, why isn't it part of the docker core? I'd love to hear from the docker team why it's not a concern for them. Presumably if it was they'd have addressed it by now.

From my own experience with docker in production, I'm yet to see any of the described scenarios crop up. Has anyone else, or is this solving an extreme edge case?

ckuehlOP10y ago

> From my own experience with docker in production, I'm yet to see any of the described scenarios crop up. Has anyone else, or is this solving an extreme edge case?

The biggest issue we see at Yelp is leaking containers in test (e.g. Jenkins aborting a job but leaving the containers it spawned still running).

sandGorgon10y ago

Also,you guys should comment on https://github.com/docker/docker/pull/5773 which is work on unprivileged systemd in docker. I think you guys can influence the bug with your experience in this.

3 more replies

zenlikethat10y ago

Zombies are not uncommon to see for an app running in a container that forks off other processes. This isn't every program but forking processes is pretty common, enough so to be worrisome.

vikiomega910y ago· 2 in thread

Noob question. Why is it impossible? You have the PID, no?

ckuehlOP10y ago

Good question! The problem is trying to signal it from outside the Docker container.

If your container has a process tree like

    PID 1: /bin/sh
    +--- PID 2: <your Python server>

dumb-init basically replaces the shell in that diagram, but forwards signals when it receives them. So when you use `docker signal`, the Python process receives the signal.

vikiomega910y ago

lox10y ago· 2 in thread

This looks to be an alternative https://github.com/krallin/tini

ckuehlOP10y ago

Yup, tini is really really similar and looks pretty cool! They're solving much of the same problem. It's unfortunate that we didn't find tini before we went and wrote dumb-init.

krallin10y ago

Quick disclaimer: I'm the author of Tini (thanks for the hat tip, by the way!).

That being said, both tools are probably indeed very similar — after all there is little flexibility in that kind of tool! Process group behavior is probably indeed where they differ the most. : )

stephen-mw10y ago· 1 in thread

Very nice! I see this as the next evolution to the phusion's custom init system[0], which was created to solve largely the same problems.

I should be able to take Yelp's dumb-init and add it easily to any linux container I want -- including things such as Alpine[1]

[0] https://github.com/phusion/baseimage-docker/blob/master/imag... [1] https://github.com/gliderlabs/docker-alpine

Gigablah10y ago

s6 and s6-overlay were also created with the same goal: https://github.com/just-containers/s6-overlay

I use it in my Alpine containers.

dpedu10y ago· 1 in thread

This is cool, but you can solve the same problem with a single line of bash

    trap 'kill $(jobs -p)' EXIT

FooBarWidget10y ago

No you can't. It looks like it can, but there are various edge cases that aren't handled.

See https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb..., section "A simple init system".

akavel10y ago

If I understand correctly, the main goal here can be summarized in the quote below:

"The motivation: modeling Docker containers as regular processes

and that seems to me as the core reason why they can't just use a simple init system (like e.g. runit I suppose?)

DanielDent10y ago

Another alternative, for what it's worth: https://github.com/rciorba/pidunu

I've created an Ubuntu PPA packaging of it (https://launchpad.net/~danieldent/+archive/ubuntu/pidunu) and you can see an example of it in use at: https://github.com/DanielDent/docker-powerdns

For situations involving multiple processes, there's also https://github.com/just-containers/s6-overlay

Example use: https://github.com/DanielDent/docker-nginx-ssl-proxy (automated Let's Encrypt SSL front-end)

ktt10y ago

Reminds me of RancherOS that also have Docker running as PID 1: https://github.com/rancher/os#how-this-works

anonbanker10y ago

Another sign that systemd is not going to infect docker any time soon.

j / k navigate · click thread line to collapse