1. I built this because I like my agents to be local. Not in a container, not in a remote server, but running on my finely-tuned machine. This helps me run all agents on full-auto, in peace.
2. Yes, it's just a policy-generator for sandbox-exec. IMO, that's the best part about the project - no dependencies, no fancy tech, no virtualization. But I did put in many hours to identify the minimum required permissions for agents to continue working with auto-updates, keychain integration, and pasting images, etc. There are notes about my investigations into what each agent needs https://agent-safehouse.dev/docs/agent-investigations/ (AI-generated)
3. You don't even need the rest of the project and use just the Policy Builder to generate a single sandbox-exec policy you can put into your dotfiles https://agent-safehouse.dev/policy-builder.html
I've seen sandbox policy documents for agents before, but this is the first ready-to-use app I've come across.
I've only had a couple of points of friction so far:
- Files like .gitconfig and .gitignore in the home folder aren't accessible, and can't be made accessible without granting read only access to the home folder, I think?
- Process access is limited, so I can't ask Claude to run lldb or pkill or other commands that can help me debug local processes.
More fine-grained control would be really nice.
For handling global rules (like ~/.gitconfig and ~/.gitignore), I keep a local policy file that whitelists my "shared globals" paths, and I tell Safehouse to include that policy by default. I just updated the README with an example that might be useful[1]. I also enabled access to ~/.gitignore by default as it's a common enough default.
For process management, there is a blurry line about how much to allow without undermining the sandboxing concept. I just added new integrations[2] to allow more process control and lldb, but I don't know this area well. You can try cloning the repo, asking your agents to tweak the rules in the repo until your use-case works, and send a PR - I'll merge it!
Alternatively, using the "custom policy" feature above, you can selectively grant broad access to your tools (you can use log monitoring to see rejections, and then add more permisions into the policy file)
[1] https://github.com/eugene1g/agent-safehouse?tab=readme-ov-fi...
- I asked the agent to change my global git username, Codex asked my permission to execute `git config --global user.name "Botje"` and after I granted permission, it was able to change this global configuration.
- I asked it to list my home directory and it was able to (this time without Codex asking for permission).
I've been trying to get microsandbox to play nicely. But this is much closer to what I actually need.
I glimpsed through the site and the script. But couldn't really see any obvious gotchas.
Any you've found so far which hasn't been documented yet?
But lately I’ve been using agents to test via browsers, and starting headless browsers from the agent is flakey. I’m working on that but it’s hard to find a secure default to run Chrome.
In the repo, I have policies for running the Claude desktop app and VSCode inside the same sandbox (so can do yolo mode there too), so there is hope for sandboxing headless Chrome as well.
Thanks for making it!
Yet the first thing I find in your README is that to install your tool I need to trust some random server serve me an .sh file that I will execute in my computer (not sure if with sudo... but still).
Come on man, give me a tarball :)
EDIT: PS: before someone gives me the typical "but you could have malware in that tarball too!!!", well, it's easier to inspect what's inside the tarball and compare it to the sources of the repo, maybe also take a look at the CI of the repo to see if the tarball is really generated automatically from the contents of the repo ;)
Alternatively, you can feed these instructions to your LLM and have it generate you a minimal policy file and a shell wrapper https://agent-safehouse.dev/llm-instructions.txt
How is this any different than running some random .sh script?
The assumption is that package-manager code is reviewed - that same assumption can be applied just as equitably to wget'ed .sh files.
tl;dr - you are reviewing everything you ever run on your system, right?
The downside is that it requires access to more than it technically needs (Claude keys for example). I’m working on a version where you sandbox the agent’s Bash tool, not the agent itself. https://github.com/Kiln-AI/Kilntainers
How about using bash-tool to intercept the commands and then passing them onto the containers?
Claude Code seemed to be able to reach outside its own sandbox sometimes, so I lost trust in it. Manually wrapping it in sandbox-exec solved the issue.
This may also have bugs but having a single setup for your sandbox reduces the surface area & frankly I have significantly more trust in literally any individual dev creating a free generic open-source tool than I do in a company selling frontier models as a service & vibe-coding[0] an afterthought companion cli at 80k loc/mo.
I honestly think that sandboxing is currently THE major challenge that needs to be solved for the tech to fully realise its potential. Yes the early adopters will YOLO it and run agents natively. It won't fly at all longer term or in regulated or more conservative corporate environments, let alone production systems where critical operations or data are in play.
The challenge is that we need a much more sophisticated version of sandboxing than anybody has made before. We can start with network, file system and execute permissions - but we need way more than that. For example, if you really need an agent to use a browser to test your application in a live environment, capture screenshots and debug them - you have to give it all kinds of permissions that go beyond what can be constrained with a traditional sandboxing model. If it has to interact with resources that cost money (say, create cloud resources) then you need an agent aware cloud cost / billing constraint.
Somehow all this needs to be pulled together into an actual cohesive approach that people can work with in a practical way.
Have you considered that it's unsolvable? Or - at least - there is an irreconcilable tension between capability and safety. And people will always choose the former if given the choice.
The most unsolvable part is prompt injection. For that you need full tracking of the trust level of content the agent is exposed to and a method of linking that to what actions it has accessible to it. I actually think this needs to be fully integrated to the sandboxing solution. Once an agent is "tainted" its sandbox should inherently shrink down to the radius where risk is balanced with value. For example, my fully trusted agent might have a balance of $1000 in my AWS account, while a tainted one might have that reduced to $50.
So another aspect of sanboxing is to make the security model dynamic.
One idea is to have the coding agent write a security policy in plan mode before reading any untrusted files:
[0] https://deepclause.substack.com/p/static-taint-analysis-for-...
That's not the case with Agent Safehouse - you can give your agent access to select ~/.dotfiles and env, but by default it gets nothing (outside of CWD)
[1] https://www.tomshardware.com/tech-industry/artificial-intell...
I like that it's just a shell script.
I do wish that there was a simple way to sandbox programs with an overlay or copy-on-write semantics (or better yet bind mounts). I don't care if, in the process of doing some work, an LLM agent modifies .bashrc -- I only care if it modifies _my_ .bashrc
Re “overlay FS” - I too wish this was possible on Macs, but the closest I got was restricting agents to be read-only outside of CWD which, after a few turns, bullies them into working in $TMP. Not the same though.
┌─ YOLO shell ──────────────────────┬─ Outer shell ─────────────────────┐
│ │ │
│ yoloai new myproject . -a │ │
│ │ │
│ # Tell the agent what to do, │ │
│ # have it commit when done. │ │
│ │ yoloai diff myproject │
│ │ yoloai apply myproject │
│ │ # Review and accept the commits. │
│ │ │
│ # ... next task, next commit ... │ │
│ │ yoloai apply myproject │
│ │ │
│ │ # When you have a good set of │
│ │ # commits, push: │
│ │ git push │
│ │ │
│ │ # Done? Tear it down: │
│ │ yoloai destroy myproject │
└───────────────────────────────────┴───────────────────────────────────┘
Works with Docker, Seatbelt, and Tart backends (I've even had it build an iOS app inside a seatbelt container).It's tailored to play nicely with Git: spin up sandboxes form CLI, expose TCP/UDP ports of apps to check your work, and if running hosted sandboxes, share the sandbox URLs with teammates. I basically want running sandboxed agents to be as easy as `git clone ...`.
Docs are early and edges are rough. This week I'm starting to dogfood all my dev using Amika. Feedback is super appreciated!
FYI: we are also a startup, but local sandbox mgmt will stay OSS.
Just use Docker, or a VM.
The other issue is that this does not facilitate unpredictable file access -- I have to mount everything up front. Sometimes you don't know what you need. And even then copying in and out is very different from a true overlay.
The main issue I want to solve is unexpected writes to arbitrary paths should be allowed but ultimately discarded. macOS simply doesn't offer a way to namespace the filesystem in that way.
This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.
What I really need is help figuring out which ones are trustworthy.
I think this needs to take the form of documentation combined with clearly explained and readable automated tests.
Most sandboxes - including sandbox-exec itself - are massively under-documented.
I am going to trust them I need both detailed documentation and proof that they work as advertised.
Your point is totally fair for evaluating security tooling. A few notes -
1. I implemented this in Bash to avoid having an opaque binary in the way.
2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)
3. There are E2E tests validating sandboxing behavior under real agents
4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.
5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt
Would xcodebuild work in this context? Presumably I'd watch a log (or have an agent) and add permissions until it works?
Basically, give an agent its own unprivileged user account (interacting with it via sudo, SSH, and shared directories), then add sandbox-exe on top for finer-grained control of access to system resources.
I also found the author to be helpful and responsive and the tool to be nicely minimalistic rather than the usual vibe coded ever expanding mess.
‘brew install sandvault’ and running ‘sv’ should get you going.
(full disclosure: I created the Homebrew formula and submitted a few PRs to the project)
$ container system start
$ container run -d --name myubuntu ubuntu:latest sleep infinity
$ container exec myubuntu bash -c "apt-get update -qq && apt-get install -y openssh-server"
$ container exec myubuntu bash -c "
apt-get install -y curl &&
curl -fsSL https://deb.nodesource.com/setup_lts.x |
bash - &&
apt-get install -y nodejs
"
$ container exec myubuntu npm install -g @anthropic-ai/claude-code
$ container exec myubuntu claude --versionIts manpage has been saying it's deprecated for a decade now, yet we're continuing to find great uses for it. And the 'App Sandbox' replacement doesn't work at all for use cases like this where end users define their own sandbox rules. Hope Apple sees this usage and stops any plans to actually deprecate sandbox-exec. I recall a bunch of macOS internal services also rely on it.
In particular, has the profile language ever been documented by anything other than the examples used by the OS and third parties reverse engineering it?
Around last summer (July–August 2025), I desperately needed a sandbox like this. I had multiple disasters with Claude Code and other early AI models. The worst was when Claude Code did a hard git revert to restore a single file, which wiped out ~1000 lines of development work across multiple files.
But now, as of March 2026, at least in my experience, agents have become more reliable. With proper guardrails in claude.md and built-in safety measures, I haven't had a major incident in about 3 months.
That said, layering multiple safeguards is always recommended—your software assets are your assets. I'd still recommend using something like this. But things are changing, bit by bit.
Also, I don’t want it to be even theoretically possible for some file in node_modules to inject instructions to send my dotfiles to China.
Solution could be two parts:
OS bringing better and easier to use OS limitations (more granular permissions; install time options and defaults which will be visible to user right there and user can reject that with choices like:
- “ask later”
- “no”
- “fuck no”
with eli5 level GUIs (and well documented). Hell, a lot of these are already solved for mobile OS. While not taking away tools away from hands of the user who wants to go inside and open things up (with clear intention and effort; without having to notarise some shit or pay someone).
2. Then apps[1] having to, forced to, adhere to use those or never getting installed.
[1] So no treating of agents as some “other” kinds of apps. Just limit it for every app (unless user explicitly decides to open things up).
It will also be a great time to nuke the despicable mess like Electron Helpers and shit and app devs considering it completely fine to install a trillion other “things” when user installed just one app without explaining it in the beginning (and hence forced to keep their apps’ tentacles simple and limited)
Having real macOS Docker would solve the problem this project solves, and 1001 other problems.
You can use it to completely sandbox claude code too.
1. Coderunner - https://github.com/instavm/coderunner
I'm very slowly working on a mock docker implementation for macOS that uses ephemeral VM to launch a true guest macOS and perform commands as per Dockerfile/copies files/etc. I use it internally for builds. No public repo yet though. Not sure if there is demand.
A practical path is ephemeral macOS VMs using Apple's Virtualization.framework coupled with APFS copy-on-write clones for fast provisioning, or limited per-process isolation via seatbelt and the hardened runtime, which respects Apple's licensing that restricts macOS VMs to Apple hardware and gives strong isolation at the cost of higher RAM and storage overhead compared with Linux containers.
It's not a pleasure to run them in a mutable environment where everything has a floating state as I do now. Native Docker for macOS would totally solve that.
What would a Phillips screwdriver bring over a flathead screwdriver? Sometimes you don't want/need the flathead screwdriver, simple as that. There are macOS-specific jobs you need to run in macOS, such as xcode toolchains etc. You can try cross compiling, but it's a pain and ridiculous given that 100% of every other OS supports containers natively (including windows). It's clear to me that Apple is trying to make the ratio jobs/#MacMinis as small as possible
We built Cyqle (https://cyqle.in) partly around this idea — each session is a full Linux desktop that's cryptographically wiped on close (AES-256 key destroyed, data unrecoverable). Agents can do whatever they want inside, and the blast radius is zero by design. No residual state, no host OS exposure.
The tradeoff is latency and connectivity requirements. For teams already doing cloud-based dev work, it's a natural fit. For local-first workflows where you need offline capability or sub-50ms responsiveness, something like Agent Safehouse makes more sense.
Both approaches are worth having — the threat model differs depending on whether you're more worried about data exfiltration or local system compromise.
https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!
If/since AI agents work continuously, it seems like running macOS in a VM (via the virtualization framework directly) is the most secure solution and requires a lot less verification than any sandboxing script. (Critical feature: no access to my keychain.)
AI agents are not at all like container deploys which come and go with sub-second speed, and need to be small enough that you can run many at a time. (If you're running local inference, that's the primary resource hog.)
I'm not too worried about multiple agents in the same vm stepping on each other. I give them different work-trees or directory trees; if they step over 1% of the time, it's not a risk to the bare-metal system.
Not sure if I'm missing something...
I’m assuming it’s similar to why people run plex, web servers, file sharing, etc
Also personally I’d rather not pay monthly fees for stuff if it can be avoided.
It supports running on a TrueNAS SCALE server, or via Incus (local or remote). I'm still working on tightening the security posture, but for many types of AI workflows it will be more than sufficient.
For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.
Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.
Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.
Code here: https://github.com/gbrindisi/agentbox
I'm involved with a project building something very similar, which we literally open sourced an alpha version of last week:
https://github.com/GreyhavenHQ/greywall
It's a bit different in that:
- We started with Linux
- It is a binary that wraps the agent runtime
- It runs alongside a proxy which captures all traffic to provide a visibility layer
- Rules can be changed dynamically at runtime
I am so happy this problem is getting the attention it deserves!
One thing we've been seeing with production AI agents is that the real risk isn't just filesystem access, but the chain of actions agents can take once they have tool access.
Even a simple log-reading capability can escalate if the agent starts triggering automated workflows or calling internal APIs.
We've been experimenting with incident-aware agents that detect abnormal behavior and automatically generate incident reports with suggested fixes.
Curious if you're thinking about integrating behavioral monitoring or anomaly detection on top of the sandbox layer.
However the challenge is, sandbox profiles (rules) are always workload specific. How do you define “least privilege” for a workload and then enforce it through the sandbox.
Which is why general sandboxes wont be useful or even feasible. The value is observing and probably auto-generating baseline policy for a given workload.
Wrong or overly relaxed policies would make sandbox ineffective against real threats it is expected to protect against.
p.s. thanks for making this; timely as I am playing whackamole with sandboxing right now.
All the issues we get from AI today (hallucinations, goal shift, context decay, etc) get amplified unbelievably fast once you begin scaling agents out due to cascading. The risk being you go to bed and when you wake up your entire infrastructure is gone lol.
I built yolobox to solve this using docker/apple containers: https://github.com/finbarr/yolobox
But given how fast agents are moving, I would be shocked if such tools were not already being built
I use clippy with rust and the only thing I had to add was:
(subpath "/Library/Developer/CommandLineTools")Then I realized the only thing I care about on my local machine is "don't touch my files", and Unix users solved that in 1970. So I just run agents as "agent" user.
I think running it on a separate machine is nicer though, because it's even simpler and safer than that. (My solution still requires careful setup and regular overhead when you get permission issues. "It's on another laptop, and my stuff isn't" has neither of those problems.)
Why always the fixation on the hardware?
How does this compare with Codex's and Claude's built-in sandboxing?
Codex: IIRC, only shell commands are sandboxed; the actual agent runtime is not.
The alternative would be “no site”, which is still somehow worse.
Most setups handle this awkwardly: fire a webhook, write to a log, hope the human is watching. The sandbox keeps the agent contained, but doesn't give it a clean "pause and ask" primitive. The agent either guesses (risky) or silently fails (frustrating).
Seems like there are two layers: the security boundary (sandbox-exec, containers, etc.) and the communication boundary (how does a contained agent reach the human?). This project nails the first. The second is still awkward for most setups.
sandbox-profiles is a solid primitive for local agents. The missing piece in production is the tool layer — even a sandboxed agent can still make dangerous API calls if the MCP tools it has access to aren't individually authed and scoped.
The real stack is: sandbox the runtime (what Agent Safehouse does) + scope the tools (what we do with JIT OAuth at the MCP layer). Neither alone is enough.
Nice work shipping this.
https://www.arcade.dev/blog/ai-agent-auth-challenges-develop...