Seriously, VMs are hardly as secure as many people want to believe unless you're utilizing enclaves and even that has vulnerabilities. I think a better approach is Seccomp and whatever other filtering makes sense.
I came away baffled that they weren't more widely-promoted, compared with Docker and friends. After thinking about it for a while, all I can figure is they're so straightforward to use and well-documented that there's no room to make one's name, or to make a buck, re-packaging them or wrapping them in complex tools, so there's little money or glory (= personal marketing via open-source project leadership/contributions) in promoting them.
[EDIT] that is: what would be a blog post in LXC/Docker land... doesn't exist, because it's covered perfectly well in the docs. What would be a simple open-source tool... becomes a blog post, because it's short, simple, and clear enough not to merit special software, but just a quick guide to existing tools. What would be a business, becomes a simple open-source tool without enough of a difficulty/convenience "moat" to support a business.
Jails seem to be treated like OpenVZ containers in the Linux world: a lighter alternative to virtual machines, not a way to build and distribute applications like Docker.
This is just my take after playing a few hours with jails, I would happily be proven wrong.
But the attack surface of a Linux kernel is very large, is pretty unpredictable, and can't be coherently masked out with rules (my favorite example Jann Horn's VM reference count bug, which was a simple concurrency flaw in the core virtual memory system). By comparison, a Linux KVM hypervisor is not just a subset of the kernel by definition, but also a much smaller codebase, a tiny fraction of the whole kernel.
Replacing shared-kernel isolation like seccomp-filtered containers with VMs is, architecturally, simply the replacement of a large trusted computing base with a smaller one. If the overhead is acceptable, it's hard to argue with from a security perspective.
Security and performance aren't the only driving forces; there are a lot of technical and operational benefits to the abstraction and standard interfaces that you get when running stacks that might otherwise look like someone took an Xzibit meme too far.
Also remember on a modern system, there are often at least 2 additional layers at work abstracting interfaces to the "bare metal" OS already.
gVisor is a very cool codebase. As an illustration of the approach: it includes its own TCP/IP stack; we use it in our command-line dev tool to allow people to SSH to their VMs over WireGuard without having to install WireGuard or obtain privileges to manage WireGuard.
The project later merged with Intel Clear Container to become what's now called Kata Containers (https://katacontainers.io/) and is now widely used by several Internet giants like Alibaba and Baidu.
The startup was acquired by Ant Finance a couple of years ago.
(I recorded a podcast with one of hyper.sh engineer if you can listen to Mandarin https://pan.icu/25)
It actually worked great, and I've struggled to get as quite a flexible CI system at other jobs since then (the big advantage was it looked like Docker, so with compose you could either spin a metal-like nested VM or just pull in some DB containers in your build instance).
Shameless plug: this is exactly what our goal is with https://kwarantine.xyz We are creating a new hypervisor (from scratch) that can run strongly isolated Docker/LXC containers.
[1] https://cappsule.github.io/ [2] https://en.wikipedia.org/wiki/Bromium#/media/File:Bromium-en...
Still, neat to have the walkthrough here in this post.
https://www.linux-kvm.org/images/d/d2/03x05B-Chao_Peng-Light...
The big win was slashing away the BIOS stuff.
We use AWS's Firecracker to turn our customers Docker containers into Firecracker microvms (Firecracker is Amazon's Rust VMM, the engine for Fargate and Lambda). Anecdotally: in my dev environment, the difference between Firecracker boot times and native Docker container startup is imperceptible; the logging we do swamps the VM boot stuff. It's very fast.
It's powered by https://github.com/containers/libkrun.
1: https://thekev.in/blog/2019-08-05-dockerfile-bootable-vm/ind...