If you had `unattended-upgrades` running and had the "automatic reboot" option enabled, then all your Ubuntu 20.04 servers running Docker would reboot themselves and not come back up.
First, the bug was in a security branch. Second, it wasn't just the containers that crashed. If you booted containers on boot via Docker, then the host OS kernel-panicked and crashed at boot, since the containers share the kernel with the host.
At that point, you can't SSH in and have to follow the procedure for restoring from backup or re-mounting the root volume on an alternate house to revert the kernel version being run.
And then of course if you revert the kernel upgrade, you were once again vulnerable to whatever problem the security update was fixing...
Not to mention we’ve also had our fair share of production triple faults from bugs in the Intel firmware patches for Spectre, which took weeks to investigate & fix between ourselves struggling to keep our exchange up & running, Intel, and AWS.
And that is why there’s value in the CoreOS/ContainerLinux-like solutions we designed & implemented nearly a decade ago now. Being able to promptly rollback any kernel/system/package upgrades at once - either manually or either after it’s detected a few panics in quick successions is actually quite awesome. Not to mention the slow update rollout strategy baked into the Omaha controller.
But the reality is that the what-ifs are always the hardest to market, nearly always after-thoughts and with fast-spiking/fast-decaying traction after major events.
Build your images in CI job and have your deploy version be (code version, image version) so patching runs through all the same tests your code does and you have a trivial roll-forward to undo any mess you find yourself in.
> Build your images in CI job
I know container images should generally be immutable, but I would expect unattended upgrades to be mostly used on the host, not in a container, in which that management system doesn't really work (unless you're doing VMs where you can deploy immutable root images to the VMs as well, or some fun bare metal + PXE combination).
My experience has been that by the time I notice some serious vulnerability is in the news, my servers have already patched themselves. I have never "hated life" or had a "hard to find and undo bug" due to automatic security patching. I pretty quickly found what caused this and had a clear path to resolution.
This is the first security update that caused a boot failure in about a decade. It was bad, but it didn't change my mind about unattended-upgrades. My takeaway that if that maybe I should have upgraded my 20.04 servers to 22.04 server sooner.
Some years ago everyone said the same about windows-servers ;)
Or add `systemd.mask=docker.service` to your boot parameters to prevent Docker from starting.
It's pretty standard for all distros to have that choice.
Isn't the common wisdom that you should have them enabled, but staggered across hours/days?
This is mostly an in-place upgrade issue?
Security patches matter, but I'm no one important, so I should be fine to wait a week or month...
Anyone else who is important though... servers for example...
It's nice to see LWN on HN ... but please remember: it is only LWN subscribers that make this kind of writing possible. If you are enjoying it, please consider becoming a subscriber yourself — or, even better, getting your employer to subscribe.
https://news.ycombinator.com/item?id=31852477
If you're interested in detailed commentary on and investigations of the FOSS space, I can't recommend a subscription to LWN enough!
It is possible to ssh in for about 2 seconds before the kernel panic so I solved it by doing this:
while true; do ssh <servername> sudo mv /usr/bin/containerd /usr/bin/containerd.backup ; sleep 1; done
On the next reboot i was able to ssh in and change to the (then just released within the past hour) kernel that doesn't have this stupid bug. After another reboot you can move containderd back and it should be working again
affected: linux-image-5.13.0-1028 not affected: >linux-image-5.13.0-1029
sleep 0.1
on most systems. (And on Busybox, you should have usleep.)For example, looking at the package for postgresql-14, an update still hasn't been released for the unscheduled mid-June release version 14.4, which fixed possible index corruption.
http://changelogs.ubuntu.com/changelogs/pool/main/p/postgres...
I would have thought this would have been packaged earlier, as I would expect the Ubuntu + postgresql would be a common combination.
It makes me wonder exactly how much of a resource is behind creating Ubuntu distributions.
I don’t really have any sources to back this up, but my impression is that Canonical is kinda trying to punch above their weight.
A sysadmin friend of mine is totally against docker and his reason is that he wants as little complexity as is needed on his systems. Complexity, he says, leads to emergent behavior.
- If you throw the box out, you know you did no harm to other boxes.
- If you change your floor, you know you didn't wipe out something useful.
- Aaand you can `git switch` to a well known state
Ofcourse it's not 100% like that, in reality you still have to have some kind of consistency on where you put your docker-compose file, Dockerfiles for all the boxes, where you mount your volumes (in some folder or scattered all over system), maybe dealing with host firewall, dealing with not-commiting secrets into git etc.
But overall, it's very positive - docker-compose is (almost) one-stop file you need to see all your references to volumes, Dockerfile, network configurations, environment files with secrets.
It seems less complex to manage a bunch of systemd services than one pile of systemd services that are managed and logged one way and bunch of docker services that are managed and logged another way.
Even container security and compliance around it is a measurable loss on its own which is trivially solved if you have bare EC2 instances and a patch cycle.
The emergent behavior of containerization has had an overall positive effect, even if it has annoying costs.
They were also better organized. Ebay and Amazon were leaner and more pleasant to use.
It could be triggered by other complex applications that use kernel container features.
This is when I introduce them to something called VirtualBox and then their eyes go bright with wonder on how simple that works.
Well, If I'd have a workhorse with loads of RAM... I'd still choose docker, because how FAST it starts/restarts. And because it is easy to recreate everything with docker - a VM may get messy when installing stuff for APP #1, #2, #x, "works on my machine!" etc.
Not to mention, if you want to "natively" pack something for Windows and macOS, containers won't even solve that problem, as they only run on Linux. Only reason you can use Docker on macOS is because of virtualization.
----
[1] I have a couple of bits running via LCX but otherwise use VMs to split services out
[2] One large VM running many containers³, or sometimes a couple of VMs, perhaps separating them performance-wise across drives or with CPU core affinity where that was/seemed easier, or just so in case of disaster they could concentrate on getting the higher priority VM+containers restored and back up first.
[3] Obviously one VM per container would defeat the container benefits, though I've seen this done where docker was the only officially supported install option and they wanted to run a service in a VM.
Either way, I dont think they understand what Docker is, what it's for, and why it makes things less complicated.
I've recently migrated to Ubuntu 22.04 and got this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971505 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970453
on HP ProLiant servers.
I actually rolled out Ubuntu 22.04 to a few servers a few weeks ago. Pretty uneventful update, all my Ansible scripts for 20.04 worked without modification against these new servers. So, I guess I dodge this bug for now. One reason I've always preferred Ubuntu over Red Hat for servers is that with Red Hat/Centos essentially everything I care about is perpetually and hopelessly out of date and obsolete. So, it just creates a lot of hassle to work around that and get reasonably current versions of things I actually need my servers to run. With Ubuntu that was always a lot more straightforward.
I currently write this on a laptop with Manjaro and Linux 5.18. I'm glad I don't have to deal with about a year of long fixed issues with hardware, bluetooth, GPUs, performance, etc. IMHO there's very little value in sticking with older kernels on desktop machines. Especially when that involves a convoluted process of back-porting and integrating lots of complicated patches. I recently put Ubuntu on an old imac (secure boot prevents booting Manjaro) and I promptly ran into hardware issues that I recall having with Manjaro a few months ago that were fixed by simply upgrading the kernel. Bluetooth especially seems way more flaky. And that's not exactly flawless on 5.18 either. I get the if it ain't broke don't fix it thing; my point is that with modern Desktop Linux things being broken is a constant. The least broken version of Linux is usually the kernel that was just released that has all the cumulative fixes for all the issues addressed in previous kernel releases. Opting out of a few years of those fixes seems misguided.
Even on servers, I suspect simply updating the kernel more regularly would not be the end of the world for most users. With an incubation period to catch bugs/blocking issues of course, the more people use a kernel version, the more stable it gets. I doubt many users would experience any regressions. And it's a lot cheaper to support. If I had the option, I don't think I would opt to run 2-3 year old kernels on any of my servers if I had a different choice. I don't see the value of opting out of 2-3 years worth of known & fixed stability, performance, and other issues.
This is exactly why you choose it. Lesser chance of insanity.
My 3 year old server is running fine. What am I missing out on exactly? My 6 year old router is also running perfectly. Don't fix what isn't broken. Updates often break things without providing me any value.
I'm running a 5 year old Android. Upgrading to a newer version will slug my phone. I don't need a newer android (yet). My phone works perfectly for me.
Now, if you are going to tell me my security is at risk. Please be specific and provide an example :)
number of tickets in launchpad such as https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970957
It makes me Q the value in my org looking into an ubuntu advantage subscription. When there are tickets that have lots of "me too" that result in unusable laptops, one should at least triage them / consolidate them into a single ticket and then be able to mark when fixed.
After all, they reinvented everything from DE to init system at least once in past.
(They also have their own containers, LXD. I actually really like that one, please keep working on that canonical)
But I assume as Ubuntu follows an April release schedule, it doesn't always match with an appropriate LTS kernel.
That being said, I rate Canonical's practices as rather poor.