Modern "containers" were invented to make things more reproducible ( check ) and simplify dev and deployments ( NOT check ).
Personally FreeBSD Jails / Solaris Zones are the thing I like to dream are pretty much as secure as a VM and a perfect fit for a sane dev and ops workflow, I didn't dig too deep into this is practice, maybe I'm afraid to learn the contrary, but I hope not.
Either way Docker is "fine" but WAY overused and overrated IMO.
I then went onto built a system with kubernetes that enabled one to run "kubernetes pods" in independent VMs - https://github.com/apporbit/infranetes (as well as create hybrid "legacy" VM / "modern" container deployments all managed via kubernetes.)
- as a total aside (while I toot my own hort on the topic of papers I wrote or contributed to), note the reviewer of this paper that originally used the term Pod for a running container - https://www.usenix.org/legacy/events/osdi02/tech/full_papers... - explains where Kubernetes got the term from.
I'd argue that FreeBSD Jails / Solaris Zones (Solaris Zone/ZFS inspired my original work) really aren't any more secure than containers on linux, as they all suffer from the same fundamental problem of the entire kernel being part of one's "tcb", so any security advantage they have is simply due lack of bugs, not simply a better design.
I picked the name and wrote the first prototype (python2) of Docker in 2012. I had not read your document (dated 2010). I didn't really read English that well at the time, I probably wouldn't have been able to understand it anyways.
https://en.wikipedia.org/wiki/Multiple_discovery
More details for the curious: I wrote the design doc and implemented the prototype. But not in a vacuum. It was a lot work with Andrea, Jérôme and Gabriel. Ultimately, we all liked the name Docker. The prototype already had the notion of layers, lifetime management of containers and other fundamentals. It exposed an API (over TCP with zerorpc). We were working on container orchestration, and we needed a daemon to manage the life cycle of containers on every machine.
feels like maybe there is some corelation
Believe Google embarked on this path with Crostini for ChromiumOS [0], but now it seems like they're going to scale down their ambitions in favour of Android [1]. Crostini may not but looks like the underlying VMM (crosvm) might live on [2].
> I'd argue that FreeBSD Jails / Solaris Zones (Solaris Zone/ZFS inspired my original work) really aren't any more secure than containers on linux, as they all suffer from the same fundamental problem of the entire kernel being part of one's "tcb", so any security advantage they have is simply due lack of bugs, not simply a better design.
Jails (or an equivalent concept/implementation) come in handy where the Kernel/OS may want to sandbox higher privilege services (like with minijail in ChromiumOS [3]).
[0] https://www.youtube.com/watch?v=WwrXqDERFm8&t=300 / summary: https://g.co/gemini/share/41a794b8e6ae (mirror: https://archive.is/5njY1)
[1] https://news.ycombinator.com/item?id=40661703
[2] https://source.android.com/docs/core/virtualization/virtuali...
[3] https://www.chromium.org/chromium-os/developer-library/guide...
And also CPU branch prediction state, RAM chips, etc. The side-channels are legion.
Container orchestration is where I see the great mistake in all of this. I consider everything running in a k8s cluster to be one "blast domain." Containers can be escaped. Faulty containers impact everyone relying on a cluster. Container orchestration is the thing I believe is "overused." It was designed to solve "hyper" scale problems, and it's being misused in far more modest use cases where VMs should prevail. I believe the existence of container orchestration and its misapplication has retarded the development of good VM tools: I dream of tools that create, deploy and manage entire VMs with the same ease as Docker, and that these tools have not matured and gained popularity because container orchestration is so easily misapplied.
Strongly disagree about containers and dev/deployment ("NOT check"). I can no longer imagine development without containers: it would be intolerable. Container repos are a godsend for deployment.
As a relatively early corporate adopter of k8s, this is absolutely correct. There are problems where k8s is actually easier than building the equivalent capability elsewhere, but a lot of uses it's put to seem to be driven more by a desire to have kubernetes on one's resume.
Even then, IMO, it makes too little sense. It would be a bit useful if non-used containers wasted a lot of resources, or if you could get an unlimited amount of them from somewhere.
But no, just creating all the containers you can and leaving them there wastes almost nothing, they are limited by the hardware you have or rent, and the thing clouds can rent are either full VMs or specialized single-application sandboxes.
AFAIK, containers solve the "how do I run both this PHP7 and this PHP8 applications on my web server?" problem, and not much more.
I do strongly believe deployments of containers are easier. If you want something that parallels a raw VM, you can "docker run" the image. Things like k8s can definitely be complicated, but the parallel there is more like running a whole ESXi cluster. Having done both, there's really only a marginal difference in complexity between k8s and an ESXi cluster supporting a similar feature set.
The dev simplification is supposed to be "stop dealing with tickets from people with weird environments", though it admittedly often doesn't apply to internal application where devs have some control over the environment.
> Personally FreeBSD Jails / Solaris Zones are the thing I like to dream are pretty much as secure as a VM and a perfect fit for a sane dev and ops workflow
I would be interested to hear how you use them. From my perspective, raw jails/zones are missing features and implementing those features on top of them ends up basically back at Docker (probably minus the virtual networking). E.g. jails need some way to get new copies of the code that runs in them, so you can either use Docker or write some custom Ansible/Chef/etc that does basically the same thing.
Maybe I'm wrong, and there is some zen to be found in raw-er tools.
Having run both at scale, I can confirm and assure you they are not as secure as VMs and did not produce sane devops workflows. Not that Docker is much better, but it is better from the devops workflow perspective, and IMHO that's why Docker "won" and took over the industry.
VMs are useful for those who live on the shoulder of someone else (i.e. *aaS) witch is ALL but insecure.
You will encounter rough edges with any technology if you use it long enough. Container technologies require learning new skills, and this is where I personally see people often get frustrated. There is also the lean left mentality of container environments, where you are expected to be responsible for your environment, which is difficult for some. I.E. users become responsible for more then in a traditional virtualizated environment. People didn't stop using VM's, they just started using containers as well. What you should use is dependent on the workload. When you have to manage more then a single VM, and work on a larger team, the value of containers becomes more apparent. Not to mention the need to rapidly patch and update in today's environment. Often VM's don't get patched because applications aren't architected in a way to allow for updates without downtime, although it is possible. There is a mentality of 'if it's not broke, don't fix it'. There is some truth that virtualized hardware can provide bounds of seperation as well, but other things like selinux also enforce these boundaries. Not to mention containers are often running inside VM's as well.
Using ephemeral VM's is not a new concept. The idea of 'cattle vs pets', and cloud, was built on KVM (OpenStack/AWS).
I would love to have a "docker-like thing" (with ROAC) that used VMs not containers (or some other isolation tech that works). But afaik that thing does not yet exist. Yes there are several "container-tool, but we made it use VMs" (firecracker and downline), but they all need weirdo special setup, won't run on my laptop, or a generic Digitalocean VM.
Docker is "Runs on any Linux, mostly, if you have a new enough kernel" meaning it packages a big VM anyway for Windows and macOS
VMs are "Runs on anything! ... Sorta, mostly, if you have VM acceleration" meaning you have to pick a VM software and hope the VM doesn't crash for no reason. (I have real bad luck with UTM and VirtualBox on my Macbook host for some reason.)
All I want is everything - An APE-like program that runs on any OS, maybe has shims for slightly-old kernels, doesn't need a big installation step, and runs any useful guest OS. (i.e. Linux)
All the other stuff about it is way less important to me than that part.
Docker's not a package manager. It doesn't know what packages are, which is part of why the chunks that make up Docker containers (image layers) are so coarse. This is also part of why many Docker images are so huge: you don't know exactly the packages you need, strictly speaking, so you start from a whole OS. This is also why your Dockerfiles all invoke real package managers— Docker can't know how to install packages if it doesn't know what they are!
It's also not cross-platform, or at least 99.999% of images you might care about aren't— they're Linux-only.
It's also not a service manager, unless you mean docker-compose (which is not as good as systemd or any number of other process supervisors) or Docker Swarm (which has lost out to Kubernetes). (I'm not sure what you even mean by 'init system for containers' since most containers don't include an init system.)
There actually are cross-platform package managers out there, too. Nix, Pkgsrc, Homebrew, etc. All of those I mentioned and more have rolling release repositories as well. ('Rolling release' is not a feature of package managers; there is no such thing as a 'rolling release package manager'.)
As I’ve said many times, putting a container on a serverless on a Xen hypervisor so you can virtualize while you virtualize? I get why The Cloud wants this, but I haven’t the foggiest idea why people sit still for it.
As a public service announcement? If you’re paying three levels of markup to have three levels of virtual machine?
You’ve been had.
The article instead reads to me as an argument for isolating customers to their own customer-specific systems so there is no web server daemon, database server, file system path or other shared system used by multiple customers.
As an aside to the article, two virtual machines each with their own kernel are generally forced to communicate with each in more complex ways through network protocols which add more complexity and increase risk of implementation flaws and vulnerabilities existing. Two processes in different cgroups with a common kernel have other simpler communication options available such as being able to read the same file directly, UNIX domain sockets, named pipes, etc.
I honestly can’t imagine running all the services we have without containers. It would be wildly less efficient and harder to develop on.
VMs are wonderful when you need the security
If you don’t, then it becomes much harder to answer the question of what exactly is deployed on a given server and what it takes to bring it up again if it goes down hard. If you but everything in Docker files, then the answer is whatever is set in the latest docker-compose file.
* Docker isn't good at packaging. When people talk about packaging, they usually understand it to include dependency management. For Docker to be good at packaging it should be able to create dependency graphs and allow users to re-create those graphs on their systems. Docker has no way of doing anything close to that. Aside from that, Docker suffers from the lack of reproducible builds, lack of upgrade protocols... It's not good at packaging... maybe it's better than something else, but there's a lot of room for improvement.
* Kubernetes doesn't provide a single API to do all the infra stuff. In fact, it provides so little, it's a mystery why anyone would think that. All those stuff like "storage", "scheduling", "networking" that you mentioned comes as add-ons (eg. CSI, CNI) which aren't developed by Kubernetes, aren't following any particular rules, have their own interfaces... Not only that, Kubernetes' integration with CSI / CNI is very lacking. For example, there's no protocol for upgrading these add-ons when upgrading Kubernetes. There's no generic interface that these add-ons have to expose to the user in order to implement common things. It's really anarchy what's going on there...
There are lots of existing VM management solutions, eg. OpenStack, VSphere -- you don't need to imagine them, they exist. They differ from Kubernetes in many ways. Very superficially, yet importantly, they don't have an easy way to automate them. For very simple tasks Kubernetes offers a very simple solution for automation. I.e. write some short YAML file. Automating eg. ESX comes down to using a library like govmomi (or something that wraps it, like Terraform). But, in the mentioned case, Terraform only managed deployment, and doesn't take care of the post-deployment maintenance... and so on.
However, the more you deal with the infra, the more you realize that the initial effort is an insignificant fraction of the overall complexity of the task you need to deal with. And that's where the management advantages of Kubernetes start to seem less appealing. I.e. you realize that you will have to write code to manage your solution, and there will be a lot of it... and a bunch of YAML files won't cut it.
There are so many use cases that get shoved into the latest, shiniest box just because it’s new and shiny.
A colleague of mine once suggested running a CMS we manage for customers on a serverless stack because “it would be so cheap”. When you face unexpected traffic bursts or a DDoS, it becomes very expensive, very fast. Customers don’t really want to be billed per execution during a traffic storm.
It would also have been far outside the normal environment that CMS expects, and wouldn’t have been supported by any of our commercial, vendored dependencies.
Our stack is so much less complicated without running everything in Docker, and perhaps ironically, about half of our stack runs in Kubernetes. The other half is “just software on VMs” we manage through typical tools like SSH and Ansible.
When home enthusiasts build multi container stacks for their project website, it gets a bit much.
I don't know - docker has been a godsend for running my own stuff. I can get a docker-compose file working on my laptop, then run it on my VPS with a pretty high certainty that it will work. Updating has also (to date) been incredibly smooth.
I will say that Docker images get one HUGE use case at our company - CUDA images with consistent environments. CUDA/pytorch/tensorflow hell is something I couldn't imagine dealing with when I was in college studying CS a few decades ago.
Docker actively prevents you from having a private repo. They don't want you to point away from their cloud.
Redhat understood this and podman allows you to have a private docker infrastructure, disconnected from docker hub.
For my personal stuff, I would like to use "FROM scratch" and build my personal containers in my own ecosystem.
In what ways? I use private repos daily with no issues.
From my perspective, it's the complete opposite: Docker is a workaround for problems created decades ago (e.g. dynamic linking), that could have been solved in a better manner, but were not.
Why?
I have my RPi4 and absolutely love docker(-compose) - deploying stuff/services on in it just a breeze compared to previous clusterf*k of relying on system repository for apps (or if something doesnt work)... with docker compose I have nicely separated services with dedicated databases in required version (yes, I ran into an issue that one service required newer and another older version of the database, meh)
As for development - I do development natively but again - docker makes it easier to test various scenarios...
I always used them as process isolation & dependency bundling.
If you are using VMs, I think NixOs/Guix is a good choice. Reproducible builds, Immutable OS, Immutable binaries and Dead easy rollback.
It still looks somewhat futuristic. Hopefully gets traction.
HTTPS is not allowed (locked down for security!), so communication is smuggled over DNS? uhh ... I suspect that a lot of what the customer "security" departments do, doesn't really make sense ...
Networks have these capabilities, inherently they're part of the specs. But only malware seems to realise that and use it. We love reusing offensive techniques for defence (see our Canarytokens stuff), and DNS comms fits that perfectly. Our customers get an actual 2-minute install, not a 2-minute-and-then-wait-a-week-for-the-firewall-rules install.
A short survey on this stuff:
Of course, once this is fixed and you start using read-only containers, one wonders why “container” exists as a persistent, named concept.
A Bromium demo circa 2014 was a web browser where every tab was an isolated VM, and every HTTP request was an isolated VM. Hundreds of VMs could be launched in a couple of hundred milliseconds. Firecracker has some overlap.
> Lastly, this approach is almost certainly more expensive. Our instances sit idle for the most part and we pay EC2 a pretty penny for the privilege.
With many near-idle server VMs running identical code for each customer, there may be an opportunity to use copy-on-memory-write VMs with fast restore of unique memory state, using the techniques employed in live migration.
Xen/uXen/AX: https://www.platformsecuritysummit.com/2018/speaker/pratt/
As more people wake up to the realization that we shouldn't trust code, I expect that the number of civilization wide outages will decrease.
Working in the cloud, they're not going to be able to use my other favorite security tool, the data diode. Which can positively guarantee ingress of control, while still allowing egress of reporting data.
Qubes OS has been relying on it for many years. My daily driver, can't recommend it enough.
i'm not anti-VM, they're great technology, i just don't think it should be the only way to get protection. VMs are incredibly inefficient... what's that you say, they're not? ok, then why aren't they integrated into protected mode OSes so that they will actually be protected?
For example, in AWS or GCP, you can isolate stuff for different environments or teams with security groups and IAM policies. You can separate them with separate VPCs that can't talk to each other. In GCP you can separate them with "projects". But soon that's not enough, companies want separate AWS accounts for separate teams or environments, and they need to be grouped under a parent org account, and you can have policies that grant ability to assume roles cross-account ... then you need separate associated groups of AWS accounts for separate divisions!
It really never ends, companies will always want to take whatever nested mess they have, and instead of cleaning it up, just nest it one level further. That's why we'll be running wasm in separate processes in separate containers in separate VMs on many-core servers (probably managed with another level of virtualization, but who can tell).
[1] https://learn.microsoft.com/en-us/windows-hardware/design/de...
An OS provides a huge amount of functionality and offers access to vast amounts of complex shared resources. Anywhere in that there can be holes.
A VM is conceptually simpler. We don't have to prove there's no way to get to a root exploit from a myriad services running as root but available to a normal application. We're concerned about things like that a VM won't access a disk belonging to another. Which is a far simpler problem.
The author did acknowledge it’s a trade off, but the economics of this trade off may or may not make sense depending on how much you need to charge your customers to remain competitive with competing offerings.
- From a Docker/Moby Maintainer
Each customer gets their own namespace and a namespace is locked down in terms of networking and I deploy Postgres in each namespace using the Postgres operator.
I've built an operator for my app, so deploying the app into a namespace is as simple as deploying the manifest.
This point is made in the context of VM bits, but that switching cost could (in theory, haven't done it myself) be mitigated using, e.g. Terraform.
The brace-for-shock barrier at the enterprise level is going to be exfiltrating all of that valuable data. Bezos is running a Hotel California for that data: "You can checkout any time you like, but you can never leave" (easily).
It took us 2-3 days of hustling to get the stuff running and production ready and providing the right answers. This is the "Terraform and Ansible-Stuff" stage of a real failover. In a full infrastructure failover, I'd expect it to take us 1-2 very long days to get 80% running and then up to a week to be fully back on track and another week of shaking out strange issues. And then a week or two of low-availability from the ops-team.
However, for 3 large customers using that product, cybersecurity and compliance said no. They said no about 5-6 weeks ago and project to have an answer somewhere within the next 1-2 months. Until then, the amount of workarounds and frustration growing around it is rather scary. I hope I can contain it to some places in which there is no permanent damage for the infrastructure.
Tech isn't necessarily the hardest thing in some spaces.
What I would like to see, would be more App virtualization software which isolates the app from the underlying OS enough to provide an safe enough cage for the app.
I know there are some commercial offerings out there (and a free one), but maybe someone can chime in has some opinions about them or know some additional ones?
I know that's what said tools are offering, but installing (and running) docker on Windows feels like loading up a whole other OS insides OS, so that even VM (Software) looks lean compared to that!
But I admit, that I have no real experience with docker and the like.
Last time I looked into this for on-prem the solutions seemed very enterprise, pay the big bux, focused. Not a lot in the OSS space. What do people use for on-prem VM orchestration that is OSS?
Would give you very nearly as good isolation for much lower cost.
We deal with banks in DayJob - they have separate VMs/containers for their own UAT & training environments, and when the same bank that works in multiple regulatory jurisdictions they usually have systems servicing those separated too as if there were completely separate entities (only bringing aggregate data back together for higher-up reporting purposes).
maybe someday that market will boom a bit more, so we can run hypervisors with vms in there that host single application kind of things. like a BSD kernel that runs postgres as its init process or something. (i know thats oversimplified probarbly ::P).
there's a lot of room in the VM space for improvement ,but pretty much all of it is impossible if you need to load an entire OS multi-purpose-multi-user into the vm.....
Until then the debate between VM and Containerisation will continue
I'm not sure why the author doesn't understand that he could have his cake and eat it too.
There has got to be a better middle ground. Like mult tenant but strong splits ( each customer on db etc )
With virtualization the attack surface is narrowed to pretty much just the virtualization interface.
The problem with current virtualization (or more specifically, the VMM's) is that it can be cumbersome, for example memory management is a serious annoyance. The kernel is built to hog memory for cache and etc. but you don't want the guest to be doing that - since you want to overcommit memory as guests will rarely use 100% of what is given to them (especially when the guest is just a jailed singular application), workarounds such as free page reporting and drop_caches hacks exist.
I would expect eventually to see high performance custom kernels for a application jails - for example: gVisor[1] acts as a syscall interceptor (and can use KVM too!) and a custom kernel. Or a modified linux kernel with patched pain points for the guest.
In effect what virtualization achieves is the ability to rollback much of the advantage of having an operating system in the first place in exchange for securely isolating the workload. But because the workload expects an underlying operating system to serve it, one has to be provided to it. So now you have a host operating system and a guest operating system and some narrow interface between the two to not be a complete clown show. As you grow the interface to properly slave the guest to the host to reduce resource consumption and gain more control you will eventually end up reimagining the operating system perhaps? Or come full circle to the BSD jail idea - imagine the host kernel having hooks into every guest kernel syscall, is this not a BSD jail with extra steps?
[1] <https://gvisor.dev/>
This can be boiled down to "we use AWS' built-in security, not our own". Using EC2 instances is then nothing but a choice. You could do the exact same thing with containers (with fargate, perhaps ?) : one container per tenant, no relations between containers => same things (but cheaper).
This made me laugh for some reason
I had my popcorn right? What is the complain here?
I network comes done, stores will have no choice but to hand the food for free.
I am currently not trouble shooting my solutions. I am trouble shooting the VM.