It’s a bit more than running a program under seccomp strict mode, but conceptually similar, so running anything too complicated likely won't work. You certainly won’t be able to sandbox chromium for taking website snapshots for example.
> The TinyKVM guest has a tiny kernel which cannot be modified.
I think migrating cooperating processes would be fairly simple, and the big challenge is rather to decide on the right set of tradeoffs.
With a read-only operating system that is identical across machines (i.e. NixOS or Silverblue), you would only have to send the dirty pages, too!
What I like most is the ability to instantly reset the state of the VM to a known predefined state. It's like restarting the VM without any actual restart. It looks like an ideal course of action for network-facing services that are constantly under attack: even if an attack succeeds, the result is erased on the next request.
Easy COW page sharing for programs that are not written with that in mind, like ML model runners, is also pretty nice.
> TinyKVM can fork itself into copies that use copy-on-write to allow for huge workloads like LLMs to share most memory. As an example, 6GB weights required only 260MB working memory per instance, making it highly scalable.
I see there is a QuickJS demo in the tinykvm_examples repo already but it'd be great to see if it's possible to get a JIT capable JavaScript runtime working as that will be an order of magnitude faster. From my experiments with server rendering a React app native QuickJS was about 12-20ms while v8 was 2-4ms after jit warmup.
I need to study this some more but I'd love to get to the point where there was a single Deno like executable that ran inside the sandbox and made all http requests through Varnish itself. A snapshot would be taken after importing the specified JS URl and then each request would run in an isolated snapshot.
Probably needs a mechanism to reset the random seed per request.
I have experience with this from libriscv, where I also embed JIT run-times like v8 and LuaJIT already.
Note that I mistranscribed the numbers above: QuickJS was 18-24ms while v8 without warmup was 12-20ms (which I think is similar to jitless perf) and warmed jit was 2-4ms when I benchmarked a couple of years back. https://news.ycombinator.com/item?id=33793181
Thanks for the complexity warning. Sounds like I need to wait for an embedded JIT example using fixed executable range before I start playing around. But it would be fun to try and make Deno run inside it somehow, perhaps building on deno_runtime and hooking the http client user agent to make requests through Varnish. Deno's permission system should allow cleanly disabling unavailable functionality like access to the file system.
I see some examples that seem to use glibc but I was under the impression only musl binaries can be truly static? Can binaries built against glibc be used with TinyKVM?
Some notes from the post
> I found that TinyKVM ran at 99.7% native speed
> As long as they are static and don’t need file or network access, they might just run out-of-the box.
> The TinyKVM guest has a tiny kernel which cannot be modified
Could you specify this a bit? @codethief
The way it's phrased makes it sound like you want to stuff TinyKVM into a container, but I suspect what you are actually asking to implement an OCI runtime with TinyKVM https://github.com/opencontainers/runtime-spec/blob/main/spe...
Does that make more sense?
> I suspect what you are actually asking to implement an OCI runtime with TinyKVM
Yes, that's what I meant! :) Apologies for the confusion!
On a related note, since you know the author ;), what capabilities[0] do I need to run TinyKVM?
The reason I'm asking is that I'm interested in nesting containers. E.g., I have a CI pipeline whose jobs run in containers and these jobs are in turn supposed to build container images. Today, this is very difficult to do securely (i.e. using rootless containers and no privileges, possibly with AppArmor & seccomp enabled) because the average OCI runtime requires capabilities that the parent OCI runtime doesn't grant by default (or that AppArmor disables by default).
Now, I only know very little about virtualization but I have been curious whether a virtualization-based sandbox might provide a way out here since IIUC the capabilities of the guest process running inside the sandbox/VM get emulated to some agree and don't necessarily need to be backed by capabilities available to the VM process on the host.
[0]: https://www.man7.org/linux/man-pages/man7/capabilities.7.htm...
edit: Of course you’re the top contributor to IncludeOS. That was the first project I thought of while reading this blog post. I’ve been obsessed with the idea of Network Function Virtualization for a long time. It’s the most natural boundary for separating units of work in a distributed system and produces such clean abstractions and efficient scaling mechanisms.
(I’m also a very happy user of Varnish in production btw. It’s by far the most reliable part of the stack, even more than nginx. Usually I forget it’s even there. It’s never been the cause of a bug, once I got it configured properly.)
You can find a bunch of posts related to this topic there as well.
Would I use this to run a distributed infra on a server a bit like docker-compose? or it's not related?
I’m exploring micro-VMs for my self-hosted PaaS, https://lunni.dev/ – and something with such little overhead seems like a really interesting option!
I'm doing some dev (on Mac) against RDP server and occasionally have other needs like that for a client. Currently I use UTM (nice QEMU Mac frontend) along with a DietPi (super stripped-down Debian) VM for these sorts of things.
I'm pretty familiar with Docker, but have a good idea of what sorts of hoop-jumping might be needed to get a graphics server to run there. Wondering if there's a simpler path.