TinyKVM: Fast sandbox that runs on top of Varnish (opens in new tab)

(info.varnish-software.com)

349 pointsperbu1y ago68 comments

68 comments

47 comments · 19 top-level

wmf1y ago· 5 in thread

Fascinating but I'm having trouble understanding the big picture. This runs a user process in a VM with no kernel? Does every system call become a VM exit and get proxied to the host? Or are there no system calls?

cryptonector1y ago

IIUC there's no need for system calls because there's no I/O. There's just program arguments and shared memory.

dividuum1y ago

You need a few syscalls: to grow your heap size (brk) or to exit your program (exit). I took a quick look at their code and here are the syscalls and arguments implemented: https://github.com/varnish/tinykvm/blob/master/src/functions...

It’s a bit more than running a program under seccomp strict mode, but conceptually similar, so running anything too complicated likely won't work. You certainly won’t be able to sandbox chromium for taking website snapshots for example.

2 more replies

winternewt1y ago

What do you mean by I/O exactly? Because to me handling HTTP requests definitely requires I/O, no matter how you technically implement it. Does the program start anew with new arguments for each HTTP request, and if so how is that an improvement over I/O syscalls?

2 more replies

jkrshnmenon1y ago

I believe it ships with its own kernel

> The TinyKVM guest has a tiny kernel which cannot be modified.

klooney1y ago

There is a non-configurable kernel in there

1 more reply

conradev1y ago· 5 in thread

Could this be used to migrate execution of a single program between two different machines?

fwsgonzo1y ago

Yep. I could imagine a deterministic method of just sending the executable + changed pages. Then load the program in the same way on the other machine, and then apply the changed pages. It would be a minimal transfer. Thread state can also be migrated, but Linux-kernel stuff like FDs cannot or at least, that's not my area of expertise!

vidarh1y ago

There was Condor for this[1], a couple of decades ago. Condor would checkpoint the process and restart it on another machine entirely user-level (but requiring processes to link to their library) by continuing to forward system calls. It of course had plenty of limitations, and some of their decisions would be considered serious security risks now (e.g. they intercept open() and record the name, and assume that its safe to reopen a file by the same name after migration), but it was an interesting system.

I think migrating cooperating processes would be fairly simple, and the big challenge is rather to decide on the right set of tradeoffs.

[1] https://chtc.cs.wisc.edu/doc/ckpt97.pdf

1 more reply

conradev1y ago

Yeah, that would be very cool!

With a read-only operating system that is identical across machines (i.e. NixOS or Silverblue), you would only have to send the dirty pages, too!

shrubble1y ago

I don’t see why not; over ten years ago the OpenVZ vm code had a way to rsync a container across the network; syncing everything; then only the pages that had changed since the start of sync; then the final pages that had changed in the last few seconds. There was a tiny delay to pause the container on the old and start on the new host; but I am sure that this could be reduced further.

pabs31y ago

I think CRIU can do that:

https://criu.org/

nine_k1y ago· 4 in thread

Oh. It's like Firecracker, only much faster 8-)

What I like most is the ability to instantly reset the state of the VM to a known predefined state. It's like restarting the VM without any actual restart. It looks like an ideal course of action for network-facing services that are constantly under attack: even if an attack succeeds, the result is erased on the next request.

Easy COW page sharing for programs that are not written with that in mind, like ML model runners, is also pretty nice.

chatmasta1y ago

It also sounds ideal for resuming memory intensive per-user programs, like LLMs with a large context window. You can basically have an executable (and its memory) attached to a user session, but only pay the cost for it while the user session has an open request.

dividuum1y ago

Yes

> TinyKVM can fork itself into copies that use copy-on-write to allow for huge workloads like LLMs to share most memory. As an example, 6GB weights required only 260MB working memory per instance, making it highly scalable.

daralthus1y ago

yes that's the durable objects ~ durable agents model that cloudflare is building

bonzini1y ago

It's more like gVisor (or QEMU's user mode emulation but it does not support KVM, only dynamic code translation).

laurencerowe1y ago· 2 in thread

This is really exciting. The 2.5us snapshot restore performance is on a par with Wasmtime but with the huge advantage of being able to run native code, albeit with the disadvantage of much slower but still microsecond interop.

I see there is a QuickJS demo in the tinykvm_examples repo already but it'd be great to see if it's possible to get a JIT capable JavaScript runtime working as that will be an order of magnitude faster. From my experiments with server rendering a React app native QuickJS was about 12-20ms while v8 was 2-4ms after jit warmup.

I need to study this some more but I'd love to get to the point where there was a single Deno like executable that ran inside the sandbox and made all http requests through Varnish itself. A snapshot would be taken after importing the specified JS URl and then each request would run in an isolated snapshot.

Probably needs a mechanism to reset the random seed per request.

fwsgonzo1y ago

You can run v8 jitless, if you want. It's going to be much faster than QuickJS. Adding JIT support means adding a fixed executable range, which you also can do already, but you can't run it in the dumb CLI example. JITs love to be W+X. So, not sure if it's an afternoon amount of work yet, due to security implications.

I have experience with this from libriscv, where I also embed JIT run-times like v8 and LuaJIT already.

laurencerowe1y ago

From my tests v8 jitless was about 50% faster than QuickJS but still almost an order of magnitude slower than with JIT.

Note that I mistranscribed the numbers above: QuickJS was 18-24ms while v8 without warmup was 12-20ms (which I think is similar to jitless perf) and warmed jit was 2-4ms when I benchmarked a couple of years back. https://news.ycombinator.com/item?id=33793181

Thanks for the complexity warning. Sounds like I need to wait for an embedded JIT example using fixed executable range before I start playing around. But it would be fun to try and make Deno run inside it somehow, perhaps building on deno_runtime and hooking the http client user agent to make requests through Varnish. Deno's permission system should allow cleanly disabling unavailable functionality like access to the file system.

I see some examples that seem to use glibc but I was under the impression only musl binaries can be truly static? Can binaries built against glibc be used with TinyKVM?

1 more reply

tuananh1y ago· 2 in thread

this is really cool if it works for your use cases.

Some notes from the post

> I found that TinyKVM ran at 99.7% native speed

> As long as they are static and don’t need file or network access, they might just run out-of-the box.

> The TinyKVM guest has a tiny kernel which cannot be modified

jedisct11y ago

And unlike WebAssembly, it can leverage specialized CPU instructions. This is huge for cryptographic implementations, video codecs, LLMs, etc.

yencabulator1y ago

Cranelift, V8, et al can use e.g. SIMD operations just fine.

otterley1y ago· 2 in thread

There's nothing in the article that suggests that it runs on top of Varnish; in fact, the author even says it's not intended to run Varnish in it.

ruben_varnish1y ago

There's a contradiction in the text, I'll giv eyou that, bu at the end he clearly links both * a Varnish Module using this <https://github.com/varnish/libvmod-tinykvm> * a set of examples in multiple languages <https://github.com/varnish/tinykvm_examples>

otterley1y ago

I still believe the nexus needs to be described clearer and more strongly in the story in order to support the title here in HN that it runs on top of Varnish. Even the blog title itself does not make such a claim.

dangoodmanUT1y ago· 2 in thread

quick someone make rust bindings

ruben_varnish1y ago

No need to wait, you can start playing right away: https://github.com/varnish-rs/varnish-rs

gquintard1y ago

did someone call?

codethief1y ago· 2 in thread

In case the author is around: Are there any plans to wrap this in an OCI-compliant runtime?

ruben_varnish1y ago

(not the author, but a friend of friend ;)

Could you specify this a bit? @codethief

The way it's phrased makes it sound like you want to stuff TinyKVM into a container, but I suspect what you are actually asking to implement an OCI runtime with TinyKVM https://github.com/opencontainers/runtime-spec/blob/main/spe...

Does that make more sense?

codethief1y ago

Hi Ruben!

> I suspect what you are actually asking to implement an OCI runtime with TinyKVM

Yes, that's what I meant! :) Apologies for the confusion!

On a related note, since you know the author ;), what capabilities[0] do I need to run TinyKVM?

The reason I'm asking is that I'm interested in nesting containers. E.g., I have a CI pipeline whose jobs run in containers and these jobs are in turn supposed to build container images. Today, this is very difficult to do securely (i.e. using rootless containers and no privileges, possibly with AppArmor & seccomp enabled) because the average OCI runtime requires capabilities that the parent OCI runtime doesn't grant by default (or that AppArmor disables by default).

Now, I only know very little about virtualization but I have been curious whether a virtualization-based sandbox might provide a way out here since IIUC the capabilities of the guest process running inside the sandbox/VM get emulated to some agree and don't necessarily need to be backed by capabilities available to the VM process on the host.

[0]: https://www.man7.org/linux/man-pages/man7/capabilities.7.htm...

gunian1y ago· 2 in thread

man see virtualization man happy man see it no crossplatform man sad

yjftsjthsd-h1y ago

I mean. It's built on KVM and integrates deeply with how processes work; I'm not sure it's possible to make it portable without a lot of engineering time, performance hit, or both.

gunian1y ago

no i get it its amazing engineering same thing with firecracker wish there was something like that lighter than docker for all 3 major platforms

2 more replies

Tepix1y ago· 1 in thread

Interesting to see the performance gain. But without file i/o and network access, what are the use cases?

jedisct11y ago

You can call host functions doing whatever you want. Similar to what WebAssembly does.

jensneuse1y ago· 1 in thread

Is this a modern version of CGI with process isolation?

jedisct11y ago

It's rather something that sits between WebAssembly and containers, combining the sandboxing guarantees of the former with the performance of the latter. From a security perspective, the composition is also really good (WebAssembly enforces memory limits, but doesn't have memory protection, NULL pointers are writable, etc. and this is solved here). But unlike WebAssembly, it is Linux-only. So, not something that can run in Web browsers.

chatmasta1y ago

I love this. Please never stop doing what you’re doing.

edit: Of course you’re the top contributor to IncludeOS. That was the first project I thought of while reading this blog post. I’ve been obsessed with the idea of Network Function Virtualization for a long time. It’s the most natural boundary for separating units of work in a distributed system and produces such clean abstractions and efficient scaling mechanisms.

(I’m also a very happy user of Varnish in production btw. It’s by far the most reliable part of the stack, even more than nginx. Usually I forget it’s even there. It’s never been the cause of a bug, once I got it configured properly.)

ruben_varnish1y ago

Original post: https://fwsgonzo.medium.com/tinykvm-the-fastest-sandbox-564a...

You can find a bunch of posts related to this topic there as well.

rwmj1y ago

Isn't this basically libkrun? https://github.com/containers/libkrun

oulipo1y ago

I'm new to this area, can someone ELI5 this? What's the difference/advantages/disadvantages compared to other process isolation like containers?

Would I use this to run a distributed infra on a server a bit like docker-compose? or it's not related?

notpushkin1y ago

This is so cool.

I’m exploring micro-VMs for my self-hosted PaaS, https://lunni.dev/ – and something with such little overhead seems like a really interesting option!

winternewt1y ago

I'm curious: would it be a good idea to switch my desktop Linux pc to using huge pages across the board?

incanus771y ago

Not entirely what this is intended for, but does anyone have experience running an X server (or Wayland, I don't care)?

I'm doing some dev (on Mac) against RDP server and occasionally have other needs like that for a client. Currently I use UTM (nice QEMU Mac frontend) along with a DietPi (super stripped-down Debian) VM for these sorts of things.

I'm pretty familiar with Docker, but have a good idea of what sorts of hoop-jumping might be needed to get a graphics server to run there. Wondering if there's a simpler path.

jedisct11y ago

Quicky someone make Zig bindings.

j / k navigate · click thread line to collapse

68 comments

47 comments · 19 top-level

wmf1y ago· 5 in thread

cryptonector1y ago

IIUC there's no need for system calls because there's no I/O. There's just program arguments and shared memory.

dividuum1y ago

2 more replies

winternewt1y ago

2 more replies

jkrshnmenon1y ago

I believe it ships with its own kernel

> The TinyKVM guest has a tiny kernel which cannot be modified.

klooney1y ago

There is a non-configurable kernel in there

1 more reply

conradev1y ago· 5 in thread

Could this be used to migrate execution of a single program between two different machines?

fwsgonzo1y ago

vidarh1y ago

I think migrating cooperating processes would be fairly simple, and the big challenge is rather to decide on the right set of tradeoffs.

[1] https://chtc.cs.wisc.edu/doc/ckpt97.pdf

1 more reply

conradev1y ago

Yeah, that would be very cool!

With a read-only operating system that is identical across machines (i.e. NixOS or Silverblue), you would only have to send the dirty pages, too!

shrubble1y ago

pabs31y ago

I think CRIU can do that:

https://criu.org/

nine_k1y ago· 4 in thread

Oh. It's like Firecracker, only much faster 8-)

Easy COW page sharing for programs that are not written with that in mind, like ML model runners, is also pretty nice.

chatmasta1y ago

dividuum1y ago

Yes

daralthus1y ago

yes that's the durable objects ~ durable agents model that cloudflare is building

bonzini1y ago

It's more like gVisor (or QEMU's user mode emulation but it does not support KVM, only dynamic code translation).

laurencerowe1y ago· 2 in thread

Probably needs a mechanism to reset the random seed per request.

fwsgonzo1y ago

I have experience with this from libriscv, where I also embed JIT run-times like v8 and LuaJIT already.

laurencerowe1y ago

From my tests v8 jitless was about 50% faster than QuickJS but still almost an order of magnitude slower than with JIT.

I see some examples that seem to use glibc but I was under the impression only musl binaries can be truly static? Can binaries built against glibc be used with TinyKVM?

1 more reply

tuananh1y ago· 2 in thread

this is really cool if it works for your use cases.

Some notes from the post

> I found that TinyKVM ran at 99.7% native speed

> As long as they are static and don’t need file or network access, they might just run out-of-the box.

> The TinyKVM guest has a tiny kernel which cannot be modified

jedisct11y ago

And unlike WebAssembly, it can leverage specialized CPU instructions. This is huge for cryptographic implementations, video codecs, LLMs, etc.

yencabulator1y ago

Cranelift, V8, et al can use e.g. SIMD operations just fine.

otterley1y ago· 2 in thread

There's nothing in the article that suggests that it runs on top of Varnish; in fact, the author even says it's not intended to run Varnish in it.

ruben_varnish1y ago

otterley1y ago

dangoodmanUT1y ago· 2 in thread

quick someone make rust bindings

ruben_varnish1y ago

No need to wait, you can start playing right away: https://github.com/varnish-rs/varnish-rs

gquintard1y ago

did someone call?

codethief1y ago· 2 in thread

In case the author is around: Are there any plans to wrap this in an OCI-compliant runtime?

ruben_varnish1y ago

(not the author, but a friend of friend ;)

Could you specify this a bit? @codethief

Does that make more sense?

codethief1y ago

Hi Ruben!

> I suspect what you are actually asking to implement an OCI runtime with TinyKVM

Yes, that's what I meant! :) Apologies for the confusion!

On a related note, since you know the author ;), what capabilities[0] do I need to run TinyKVM?

[0]: https://www.man7.org/linux/man-pages/man7/capabilities.7.htm...

gunian1y ago· 2 in thread

man see virtualization man happy man see it no crossplatform man sad

yjftsjthsd-h1y ago

I mean. It's built on KVM and integrates deeply with how processes work; I'm not sure it's possible to make it portable without a lot of engineering time, performance hit, or both.

gunian1y ago

no i get it its amazing engineering same thing with firecracker wish there was something like that lighter than docker for all 3 major platforms

2 more replies

Tepix1y ago· 1 in thread

Interesting to see the performance gain. But without file i/o and network access, what are the use cases?

jedisct11y ago

You can call host functions doing whatever you want. Similar to what WebAssembly does.

jensneuse1y ago· 1 in thread

Is this a modern version of CGI with process isolation?

jedisct11y ago

chatmasta1y ago

I love this. Please never stop doing what you’re doing.

ruben_varnish1y ago

Original post: https://fwsgonzo.medium.com/tinykvm-the-fastest-sandbox-564a...

You can find a bunch of posts related to this topic there as well.

rwmj1y ago

Isn't this basically libkrun? https://github.com/containers/libkrun

oulipo1y ago

I'm new to this area, can someone ELI5 this? What's the difference/advantages/disadvantages compared to other process isolation like containers?

Would I use this to run a distributed infra on a server a bit like docker-compose? or it's not related?

notpushkin1y ago

This is so cool.

I’m exploring micro-VMs for my self-hosted PaaS, https://lunni.dev/ – and something with such little overhead seems like a really interesting option!

winternewt1y ago

I'm curious: would it be a good idea to switch my desktop Linux pc to using huge pages across the board?

incanus771y ago

Not entirely what this is intended for, but does anyone have experience running an X server (or Wayland, I don't care)?

I'm pretty familiar with Docker, but have a good idea of what sorts of hoop-jumping might be needed to get a graphics server to run there. Wondering if there's a simpler path.

jedisct11y ago

Quicky someone make Zig bindings.

j / k navigate · click thread line to collapse