from deno_sandbox import DenoDeploy
sdk = DenoDeploy()
with sdk.sandbox.create() as sb:
# Run a shell command
process = sb.spawn("echo", args=["Hello from the sandbox!"])
process.wait()
# Write and read files
sb.fs.write_text_file("/tmp/example.txt", "Hello, World!")
content = sb.fs.read_text_file("/tmp/example.txt")
print(content)
Looks like the API protocol itself uses websockets: https://tools.simonwillison.net/zip-wheel-explorer?package=d...> Deno Sandbox gives you lightweight Linux microVMs (running in the Deno Deploy cloud) ...
Hit a snag: Sprites appear network-isolated from Fly's 6PN private mesh (fdf:: prefix inside the Sprite, not fdaa::; no .internal DNS). So a Tokenizer on a Fly Machine isn't directly reachable without public internet.
Asked on the Fly forum: https://community.fly.io/t/can-sprites-reach-internal-fly-se...
@tptacek's point upthread about controlling not just hosts but request structure is well taken - for AI agent sandboxing you'd want tight scoping on what the proxy will forward.
> The real key materializes only when the sandbox makes an outbound request to an approved host. If prompt-injected code tries to exfiltrate that placeholder to evil.com? Useless.
That seems clever.
It's a little HTTP proxy that your application can route requests through, and the proxy is what handles adding the API keys or whatnot to the request to the service, rather than your application, something like this for example:
Application -> tokenizer -> Stripe
The secrets for the third party service should in theory then be safe should there be some leak or compromise of the application since it doesn't know the actual secrets itself.
Cool idea!
(The credential thing I'm actually proud of is non-exfiltratable machine-bound Macaroons).
Remember that the security promises of this scheme depend on tight control over not only what hosts you'll send requests to, but what parts of the requests themselves.
I cannot remember what the platform was called, let me know if you do.
It's a sandbox that uses envoy as a transparent proxy locally, and then an external authz server that can swap the creds.
The idea is extended further in that the goal is to allow an org to basically create their own authz system for arbitrary upstreams, and then for users to leverage macaroons to attentuate the tokens at runtime.
It isn't finished but I'm trying to make it work with ssh/yubikeys as an identity layer. The authz macaroon can have a "hole" that is filled by the user/device attestation.
The sandbox has some nice features like browser forwarding for Claude oauth and a CDP proxy for working with Chrome/Electron (I'm building an Obsidian plugin).
I'm inspired by a lot of the fly.io stuff in tokenizer and sprites. Exciting times.
Presumably the proxy replaces any occurrence of the placeholder with the real key, without knowing anything about the context in which the key is used, right? Because if it knew that the key was to be used for e.g. HTTP basic auth, it could just be added by the proxy without using a placeholder.
So all the attacker would have to do then is find and endpoint (on one of the approved hosts, granted) that echoes back the value, e.g. "What is your name?" -> "Hello $name!", right?
But probably the proxy replaces the real key when it comes back in the other direction, so the attacker would have to find an endpoint that does some kind of reversible transformation on the value in the response to disguise it.
It seems safer and simpler to, as others have mentioned, have a proxy that knows more about the context add the secrets to the requests. But maybe I've misunderstood their placeholder solution or maybe it's more clever than I'm giving it credit for.
await using sandbox = await Sandbox.create({
secrets: {
OPENAI_API_KEY: {
hosts: ["api.openai.com"],
value: process.env.OPENAI_API_KEY,
},
},
});
await sandbox.sh`echo $OPENAI_API_KEY`;
// DENO_SECRET_PLACEHOLDER_b14043a2f578cba75ebe04791e8e2c7d4002fd0c1f825e19...
It doesn't prevent bad code from USING those secrets to do nasty things, but it does at least make it impossible for them to steal the secret permanently.Kind of like how XSS attacks can't read httpOnly cookies but they can generally still cause fetch() requests that can take actions using those cookies.
Doesn't help much if the use of the secret can be anywhere in the request presumably, if it can be restricted to specific headers only then it would be much more powerful
Agreed, and this points to two deeper issues: 1. Fine-grained data access (e.g., sandboxed code can only issue SQL queries scoped to particular tenants) 2. Policy enforced on data (e.g., sandboxed code shouldn't be able to send PII even to APIs it has access to)
Object-capabilities can help directly with both #1 and #2.
I've been working on this problem -- happy to discuss if anyone is interested in the approach.
Same idea with more languages on OCI. I believe they have something even better in the works, that bundles a bunch of things you want in an "env" and lets you pass that around as a single "pointer"
I use this here, which eventually becomes the sandbox my agent operates in: https://github.com/hofstadter-io/hof/blob/_next/.veg/contain...
Had some previous discussion that may be interesting on https://news.ycombinator.com/item?id=46595393
> via an outbound proxy similar to coder/httpjail
looks like AI slop ware :( I hope they didn't actually run it.
This isn’t the traditional “run untrusted plugins” problem. It’s deeper: LLM-generated code, calling external APIs with real credentials, without human review. Sandboxing the compute isn’t enough. You need to control network egress and protect secrets from exfiltration.
Deno Sandbox provides both. And when the code is ready, you can deploy it directly to Deno Deploy without rebuilding."
I don't know personally how to even type ’ on my keyboard. According to find in chrome, they are both considered the same character, which is interesting.
I suspect some word processors default to one or the other, but it's becoming all too common in places like Reddit and emails.
Also, “em-dashes are something only LLMs use” comes perilously close to “huh, proper grammar, must’ve run this by a grammar checker”.
(we do this all the time; eg. a new popular saying lands in an episode of a tv show, and then other people start adopting it, even subconsciously)
(that's what Gemini would say)
Now that I think further, doesnt this also potentially break HTTP semantics? E.g. if the key is part of the payload, then a data.replace(fake_key, real_key) can change the Content Length without actually updating the Content-Length header, right?
Lastly, this still doesnt protect you from other sorts of malicious attacks (e.g. 'DROP TABLE Users;')...Right? This seems like a mitigation, but hardly enough to feel comfortable giving an LLM direct access to prod, no?
A 2 vCPU, 4GB Ram and 40GB Disk instance on Hetzner cost 4.13 USD.
The same here is:
$127.72 without pro plan, and $108.72 with pro plan.
This means to break even, I can only use this for 4.13/127.72*730 = 23.6 hours every month, or, less than an hour daily.
The real question is can the microVMs run in just plain old linux, self-hosted.
Unfortunately there's no other way to make money. If you're 100% liberally licensed, you just get copied. AWS/GCP clone your product, offer the same offering, and they take all the money.
It sucks that there isn't a middle ground. I don't want to have to build castles in another person's sandbox. I'd trust it if they gave me the keys to do the same. I know I don't have time to do that, but I want the peace of mind.
So many sandbox products these days though. What are people using in production and what should one know about this space? There's Modal, Daytona, Fly, Cloudflare, Deno, etc
Nearly all players in this space use Gvisor or Firecracker.
Many a time I have tried to figure a self scaling EC2 based CI system but could never get everything scaled and warm in less than 45 seconds, which is sucky when you’re waiting on a job to launch. These microvm as a service thingys do solve a problem.
(You could use lambda, but that’s limited in other ways).
Looks like the main innovation here is linking outbound traffic to a host with dynamic variables - could that be added to deno itself?
Why limit the lifetime on 30 mins ?
I really like it. Startup times are now better than node (if not as good as bun). And being able to put your whole "project" in a single file that grabs dependencies from URLs reduces friction a surprising amount compared to having to have a whole directory with package.json, package-lock.json, etc.
It's basically my "need to whip up a small thing" environment of choice now.
How to know what domains to allow? The agent behavior is not predefined.
[0] https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
Will give these a try. These are exciting times, it's never been a better time to build side projects :)
We recently built our own sandbox environment backed by firecracker and go. It works great.
For data residency, i.e. making sure the service is EU bound, there is basically no other way. We can move the service anywhere we can get hardware virtualisation.
As for the situation with credentials, our method is to generate CLIs on the fly and expose them to the LLMs and then they can shell script them whichever way they want. The CLIs only contain scoped credentials to our API which handles oauth and other forms of authentication transparently. The agent does not need to know anything about this. All they know is that they can do
$ some-skillset search-gmail-messages -q "emails from Adrian"
In our own experiments we find that this approach works better and it just makes sense given most of the latest models are trained as coding assistants. They just love bash, so give them the tools.
I know that each of these things is subtly different, but they're similar enough that the bootable snapshot creation workflow (which I expect is a common one) has some sharp edges, since you have to interact with all three APIs at the same time.
Also, the CLI doesn't give a useful error when you try to create a snapshot from a currently attached volume.
Finally, updating a snapshot is more steps than I'd ideally like. I would much rather be able to make changes in a sandbox with a snapshot root and have them persist as a new snapshot. I kind of get why this isn't currently the case, but The volume/snapshot dance feels (for my usecase) like it's missing some abstraction.
That said, now that I've got a snapshot set up it's a nice experience. I've got an alias for `deno sandbox create --root dev --ssh` and I can `claude` in yolo mode without much fear.
Congratulations to the team :)
I realize this is using other interactions, but I'd like a bit more observability than just the isolated environment... I'm not even saying VS Code specifically, but something similar at the least.
Just an idea…
It uses web workers on a web browser. So is this Deno Sandbox like that, but for server? I think Node has worker threads.
It's about 10x what a normal VM would cost at a more affordable hoster. So you better have it run only 10% of the time or you're just paying more for something more constrained.
A full month of runtime would be about $50 bucks for a 2vCPU 1GB RAM 10GB SSD mini-VM that you can get easily for $5 elsewhere.
Mentioned the same in this comment as well: https://news.ycombinator.com/item?id=46881920
Those limitations from other tools was exactly why I made https://github.com/danthegoodman1/netfence for our agents
Even if this was true, "everyone building X independently" is evidence that one company should definitely build X and sell it to everyone
It's really useful to just turn a computer on, use a disk, and then plop its url in the browser.
I currently do one computer per project. I don't even put them in git anymore. I have an MDM server running to manage my kids' phones, a "help me reply to all the people" computer that reads everything I'm supposed to read, a dumb game I play with my son, a family todo list no one uses but me, etc, etc.
Immediate computers have made side projects a lot more fun again. And the nice thing is, they cost nothing when I forget about them.
SSH in, it resumes where you left off, auto-suspends on disconnect. $0.50/month stopped.
I have the same pattern - one box per project, never think about them until I need them.
The short answer is no. And more so, I think that "Everyone I know in my milieu already built this for themselves, but the wider industry isn't talking about it" is actually an excellent idea generator for a new product.
Here's my list of code execution sandboxing agents launched in the last year alone: E2B, AIO Sandbox, Sandboxer, AgentSphere, Yolobox, Exe.dev, yolo-cage, SkillFS, ERA Jazzberry Computer, Vibekit, Daytona, Modal, Cognitora, YepCode, Run Compute, CLI Fence, Landrun, Sprites, pctx-sandbox, pctx Sandbox, Agent SDK, Lima-devbox, OpenServ, Browser Agent Playground, Flintlock Agent, Quickstart, Bouvet Sandbox, Arrakis, Cellmate (ceLLMate), AgentFence, Tasker, DenoSandbox, Capsule (WASM-based), Volant, Nono, NetFence
A quick search this popped up:
https://news.ycombinator.com/item?id=45486006
If we can spin up microVM so quickly, why bother with Docker or other containers at all?
pretty smart. why isn't this the norm?
Next step for me is creating a secrets proxy like credit card numbers are tokenized to remove risk of exfiltrating credentials.
Edit: It’s nice that Deno Sandbox already does this. Will check it out.
Can you configure Demo Sandbox to run on a self hosted installation of Deno Deploy (deployd), or is this a SaaS only offering?
That website does exist. It may hurt your eyes.