> But the practical limitation is language support. You cannot run arbitrary Python scripts in WASM today without compiling the Python interpreter itself to WASM along with all its C extensions. For sandboxing arbitrary code in arbitrary languages, WASM is not yet viable.
There are several versions of the Python interpreter that are compiled to WASM already - Pyodide has one, and WASM is a "Tier 2" supported target for CPython: https://peps.python.org/pep-0011/#tier-2 - unofficial builds here: https://github.com/brettcannon/cpython-wasi-build/releases
Likewise I've experimented with running various JavaScript interpreters compiled to WASM, the most popular of those is probably QuickJS. Here's one of my many demos: https://tools.simonwillison.net/quickjs (I have one for MicroQuickJS too https://tools.simonwillison.net/microquickjs )
So don't rule out WASM as a target for running non-compiled languages, it can work pretty well!
Wasmer can run now Python server-side without any restrictions (including gevent, SQLAlchemy and native modules!) [1] [2]
Also, cool things are coming on the JS land running on Wasmer :)
[1] https://wasmer.io/posts/greenlet-support-python-wasm
[2] https://wasmer.io/posts/python-on-the-edge-powered-by-webass...
From what I can tell, the point they're making is that if you want a sandbox that you can put whatever you want into and have it work without it having explicit support provided for that language in the form of recompiling the runtime, it's not going to work. If someone is expecting to be able to throw stuff they already have into a sandbox as-is and have it work, WASM is not what they're looking for (at least not today).
So while the statement is technically true that you can't run "arbitrary code in arbitrary languages", the practical reality is that for many languages WASM is a great solution despite that.
What I’ve seen suggests the most common answers are (a) “containers” and (b) “YOLO!” (maybe adding, “Please play nice, agent.”).
One approach that I’m about to try is Sandvault [0] (macOS only), which uses the good old Unix user system together with some added precautions. Basically, give an agent its own unprivileged user account and interact with it via sudo, SSH, and shared directories.
I try not to run LLM's directly on my own host. The only exception I have is that I do use https://github.com/karthink/gptel on my own machine, because it is just too damn useful. I hope I don't self own myself with that someday.
It helps that most of my projects are open source so I don't need to worry about prompt injection code stealing vulnerabilities. That way the worst that can happen would be an attack adding a vulnerability to my code that I don't spot when I review the PR.
And turning off outbound networking should protect against code stealing too... but I allow access to everything because I don't need to worry about code stealing and that way Claude can install things and run benchmarks and generally do all sorts of other useful bits and pieces.
I already have a couple folks using it for claude: https://github.com/smol-machines/smolvm/discussions/3
These containers only have the worker agent's workspace and some caching dirs (e.g. GOMODCACHE) mounted, and by default have `--network none` set. (Some commands, like `go mod download`, can be explicitly exempted to have network access.)
I also use per-skill hooks to enforce more filesystem isolation and check if an agent attempts to run e.g. `go build`, and tell it to run `aww exec go build` instead. (AWW is the name of the agent workflow system I've been developing over the past month—"Agent Workflow Wrangler.")
This feels like a pragmatic setup. I'm sure it's not riskless, but hopefully it does enough to mitigate the worst risks. I may yet go back to running Claude Code in a dedicated VM, along with the containerized commands, to add yet another layer of isolation.
https://github.com/Kiln-AI/Kilntainers
Can run anything from a busybox in WASM to a full cloud VM. Agent just sees a shell.
"Make me a sandbox for yourself! Make sure its really secure!"
I would add that in addition to Unix permissions, sandvault also utilizes macOS sandbox-exec to further limit the blast radius.
That's not to say I don't use bwrap.
But I use that specifically to run 'user-emulation' stories where an agent starts in their own `~/` environment with my tarball at ~/Downloads/app.tar.gz, and has to find its way through the docs / code / cli's and report on the experience.
The attack surface of Xen, the current hypervisor of Qubes, is smaller compared to browsers and OSes that have 0days pathed several times a year. Even most Xen vulns don't affect Qubes.
I just can't imagine putting my whole digital life in one "normal" OS and hoping that the OS or browser security will keep me safe. I'm mentioning the browser because a lot what used to be in the OS is now in the browser, so it's functionally like another OS.
From a usability point of view it's also useful as I can have different environments. Not only different tools in each VM which means I can pretty much forget about dependency issues, but also different data in each VM. If I wanted, I could run any agent or malware on a VM and the exposure would only be whatever data I chose to put in that VM.
Of course, if you're not passing data between certain VMs, you could use different computers for an even better security.
This is the approach I’m using for my open source project qip that lets you pipeline wasm modules together to process text, images & data: https://github.com/royalicing/qip
qip modules follow a really simple contract: there’s some input provided to the WebAssembly module, and there’s some output it produces. They can’t access fs/net/time. You can pipe in from your other CLIs though, e.g. from curl.
I have example modules for markdown-to-html, bmp-to-ico (great for favicons), ical events, a basic svg rasterizer, and a static site builder. You compose them together and then can run them on the command line, in the browser, or in the provided dev server. Because the module contract is so simple they’ll work on native too.
gVisor can even use KVM.
What gVisor doesn't have is the big Linux kernel, it attempts to roll a subset of it on its own in Go. And while doing so it allows for more convenient (from the host side) resource management.
Imagine taking the Linux kernel and starting to modify it to have a guest VM mode (memory management merged with the host, sockets passed through, file systems coupled closer etc). As you progress along that axis you will eventually end up as a gVisor clone.
Ultimately what all these approaches attempt to do is to narrow the interface between the jailed process as the host kernel. Because the default interface is vast. Makes you wonder if we will ever have a kernel with a narrow interface by default, a RISC-like syscall movement for kernels.
It’s not surprising that most people don’t know about it, because QubesOS as a daily driver can be painful. But with some improvements, I think it’s the right way to do it.
When I'm trying to get some software up and running, I've had issues with Debian many times, as well as with Fedora. Rarely with both. With Qubes after a few minutes of trying on Debian and running into some obscure errors, I can just say "fuck it" and try with Fedora, or vice versa. Over the years it has saved me more time than the time I've invested it learning how Qubes works or dealing with Qubes-specific issues.
I also don't have to care about polluting my OS with various software and running into a dependency hell.
If a VM crashes or hangs, it's usually OK, as it's just a VM.
It's much easier to run Whonix or VPNs without worrying for IP leaks.
I'm CTO at Buildkite, have been noodling on one with a view to have an environment that can run CI workloads and Agentic ones https://github.com/buildkite/cleanroom
The credential problem is handled through proxy middleware - agents never see real tokens, requests get routed through policy-checked proxies that inject credentials only for approved operations.
Happy to share more: https://islo.dev
Having worked on kernel and hypervisor code, I really don't see much of a difference in terms of isolation. Could you elaborate on this?
Whereas yeah, you can run gVisor in KVM mode where it does use hardware virtualization, and at that point the isolation boundary is much closer to a microVM's. I believe the real difference then becomes more about what's on either side of that boundary where gVisor gives you a memory-safe Go kernel making ~70 host syscalls, a microVM gives you a full guest Linux kernel behind a minimal VMM. So at least in my mind it comes down to a bit of around different trust chains, not necessarily one strictly stronger than the other.
Just like containers, VMs are very loosely defined and, under the hood, composed of mechanisms that can be used in isolation (paging, trapping, IOMMU vs individual cgroups and namespaces). It's those mechanisms that give you the actual security benefits.
And most of them are used outside of VMs, to isolate processes on a bare kernel. The system call/software interrupt trapping and "regular" virtual memory of gVisor (or even a bare Linux kernel) are just as much of a "hardware boundary" as the hyper calls and SLAT virtual memory are in the case of VMs, just without the hacks needed to make the isolated side believe it's in control of real hardware. One traps into Sentry, the other traps into QEMU, but ultimately, both are user-space processes running on the host kernel. And they themselves are isolated, using the same very primitives, by the host kernel.
As you clarified here, the real difference lies in what's on the other side of these boundaries. gVisor will probably have some more overhead, at least in the systrap mode, as every trapped call has to go through the host kernel's dispatcher before landing in Sentry. QEMU/KVM has this benefit of letting the guest's user-space call the guest kernel directly, and only the kernel typically can then call QEMU. The attack surface, too, differs a lot in both cases. gVisor is a niche Google project, KVM is a business-critical component of many public cloud providers.
It may sound like I'm nitpicking, but I believe that it's important to understand this to make an informed decision and avoid the mistake of stacking up useless layers, as it is plaguing today's software engineering.
Thanks for your reply and post by the way! I was looking for something like gVisor.
Giving agents their own user account is my go-to solution and solves all my practical problems with by far the oldest, well documented, and simplest isolation mechanism.
2) can access/write a specific folder?
3) can access network?
4) can access gateway/internet?
5) can access local network? (vlans would help here)
6) give access to USB devices
7) needs access to the screen? -> giveframebuffer access / drawing primitive
8) Need to write? Use an overlay FS that can be checked by the host and approved
9) sub processes can never escalate permissions
By default: nothing. But unfortunately, it’s always by default allow.
Also, make it simple to remove the permissions again.
Here's a project I've been working on to address the network risk. Uses nftables firewall allowing outbound traffic only to an explicit pinned domain allowlist (continuously refreshes DNS resolutions in the background).
I've seen smaller developers experimenting with this, but haven't heard of larger orgs doing it, possibly because UGC took the place of modders as well, and I come from an older world where what developers of my time 20 years ago would have had their hands on was an actual SDK that wasn't a part of a long microtransaction pipeline.
In my org's case, where we built an entire game engine off Lua, and previously had done Lua integration in the Source Engine, I would have loved to have had sandboxing from the start rather than trying to think about security after the fact.
To the article's point: even if you were to sandboxing today in those environments, I suspect you'd be faster than some of the fastest embedded scripting languages because they're just that slow.