These slides correspond to a workshop I conducted a week ago demonstrating the internals of Docker and how Docker containers can be run without using any of the Docker tools or runtime.
Docker is a great tool, and I'm glad it's gained so much traction. But containerization is still new to many people, and even then there's still a lot of confusion about the difference between Docker and containerization in general. The goal of this presentation isn't to discourage anyone from using Docker, but to outline the lay of the land for people interested in using containers.
Personally, I run containers both with systemd and with Docker. The good news is that it's really easy to switch from one to the other, so there's very little cost to trying it out both ways.
I'm sorry if I jumped the gun before the blogpost was ready. :) Hopefully, the discussion will help even more!
Thanks! I really appreciate that.
> I'm sorry if I jumped the gun before the blogpost was ready. :)
Nah, it's cool - if anything it creates even more social pressure not to procrastinate writing it!
Edit: arrow keys are your friends.
Always nice to cut through the hype and see how things really work. I knew Docker used namespacing for some quasi-virtualization, but was wary of using it because I haven't had time to dissect it on my own. You're a good writer, able to get a lot of information across in an engaging way.
See: https://groups.google.com/d/msg/golang-nuts/fzr3pebUBBM/Ehat...
If anyone's interested in hosting and/or filming an updated version of this talk, drop me a line! My email is in my profile.
Since the slides also mention that you can use Docker images with a systemd-nspawn/machinectl setup it would be great if they soon supported the v2 Docker Registry and image format which actually does use content-addressable hashes for images.
Thanks for catching that - I've updated it.
> Since the slides also mention that you can use Docker images with a systemd-nspawn/machinectl setup it would be great if they soon supported the v2 Docker Registry and image format which actually does use content-addressable hashes for images.
I haven't used the v2 registry, so I don't know if systemd (machinectl) supports this yet, but I imagine they will soon if they don't already.
Is there an ecosystem around LXC that provides things like Flynn?
I can do it all myself, but I can't do it, my development job, and be home for dinner at night. Like most tools today, the value is in the ecosystem, not the tool itself.
Users can then decide how they want to deploy. Docker takes that base container and adds layers of aufs, constrains the container OS template to single app by modifying the container OS's init, gives you the dockerfile and focuses on deploy centric functionality with immutability idempotency etc, and this makes it much more complex to use than LXC. Its a use case built on Linux containers, not containers itself.
LXC is not 'low level kernel capabilities' [1] as Docker misleadingly refers to it on it's website. This has resulted in a lot of confusion about LXC in the Docker ecosystem with folks thinking its 'difficult to use' or 'just low level stuff'. A tad unfair to LXC given Docker was based on it till 0.9 and knew exactly what it was, and is as accurate as referring to docker or nspawn as low level capabilities.
That would be kernel namespaces and cgroups that LXC uses to give end user containers, like Docker uses post 0.9 directly with libcontainer and systemd-nspawn uses for its containers.
Docker builds on containers to deliver additional functionality. There is an additional cost in complexity but if that is your use case the trade off may be worth it, but for other use cases the complexity may be overkill.
You can simply make a VM image of LXC installed and you have boot2lxc, the vast ecosystem of orchestration technology that works in VMs and systems works in LXC, you don't need specific tools to be designed just for LXC. its not opinionated or exclusive like the tools built around the Docker ecosystem that are finely focussed on a specific use case and typically support Docker only.
It uses the docker equivalent of "net=host" (which provides better performance at the cost of isolation), and the disk is pointing at a shared "changeroot" on disk, instead of at a layered FS.
Both of these can be better isolated with natted interfaces and a `btrfs` (which has its own reliability issues) layered image, but they are not what you expect by default.
"But, you NEED to run the installer on that (bare metal) server!" Nope, I can just boot from knoppix (remember?) and mount the disk and run debootstrap on it.
Every so often I run into programmers and sysadmins that believe these things are a kind of magic. They're not. They're just files on a disk.
Love this presentation. Thank you!
edit: s/mount knoppix/boot from knoppix/
http://www.freedesktop.org/software/systemd/man/machinectl.h...
With proper handling of access (allowing unprivileged users to start containers) along with --bind for the home directory, this could be a viable alternative to Debian's schroot [s].
There's also a complimentary lwn article from 2013 that's worth reading:
https://lwn.net/Articles/572957/
That also contains a quote that explains a bit about systemd (if read maliciously): "As part of the development of systemd, the team looked at various kernel features to see if they were relevant to the project."
At least with this (containers w/log handling etc) we get something for our complexity. Still, having had two seperate machines fail to boot/even come up with a text console with some sensible errors - I'm far from sold on the idea that I want all these features in PID 1.
[1] changed user "foo" to "root" to be a little more clear. Maybe "user1" would work as well - but systemd (unlike lxc etc) requires root?).
[s] https://wiki.debian.org/Schroot
Reminds me that I should probably make write-up of how I set up schroot to allow "source"-access for root, and automagic sessions for a standard user backed by lvm -- the documentation is a bit dense.
Lennart Poettering has spoken about containers and btrfs subvolumes and easy snapshots, this could be the direction systemd goes in future for managing the OS with apps in btrfs subvolume containers, with rollback, management etc so this seems like it may mature fairly fast, except unprivileged container support which Lennart does not seem to like.[1]
[1]https://plus.google.com/+LennartPoetteringTheOneAndOnly/post...
There's a few things it doesn't do (neither docker, or lxc for that matter) - yet at least - such as mounting fses before container start or manage upgrades.
I suspect it could be hacked somehow with service files, too
This is particularly significant if you use something like btrfs snapshots with a base mount and child mounts, or overlayfs and invalidate inodes during upgrades, instead of a dumb-ish "yum upgrade/apt-get upgrade/etc"
The main difference is that in this case the update is at the mount / container level when propagated from the base image.
Some (most) others also do that with image versions and a full image swap.
I've seen them used in good and bad ways, but mostly the former. It's good to see something that actually solves real problems coming back in to use again. Added bonus, it's an extremely mature (from a tech perspective) way of doing things. Chroot jails have been around for decades.
Edit: Apparently, if you don't touch-move but only tap, you can keep it for, getting out of sync.
I had the same question as OP and didn't find anything that works as he described it when I looked 6+ months ago.
One major difference between systemd-nspawn and LXC is how simple and reliable systemd-nspawn - in particular the guest OS/image has to run systemd as well which is what provides the integration.
Eventually, if you wanted, you could mix and match the tools just fine.
First thanks, this was really interesting.
Sorry, I must not be that bright, I cannot guess your email from your username :-( (and I don't have a tumbler login, and you haven't enable dm from strangers on twitter), but I wanted to point at that you have a typo on slide 9, s/ journactl/journalctl/.
I should stress that last point - I dislike certain aspects of docker, but I love their system for building images. I want to somehow adapt it for building bare metal images.
I am unconvinced downloading system images from the Internet is a great thing overall.
http://user-mode-linux.sourceforge.net/ https://en.wikipedia.org/wiki/User-mode_Linux