> non-essential containers which exit don't get restarted or replaced
That's why you have one or two essential watchdog containers which relaunch the workers. You keep a large number of them in an "idle, but hot" status to allow for bursts?
I'm a little confused by this approach; are there any non-essential containers in your suggested architecture? This sounds like the watchdog container is just a parent process that launches a bunch of subprocesses, which is definitely a workable solution, although not the one we decided to use. If there are primitives for an essential container to inspect container state and relaunch other containers in the task, that'd be great to know about.