{
"sandbox": {
"enabled": true,
"filesystem": {
"allowRead": ["."],
"denyRead": ["~/"],
"allowWrite": ["."],
"denyWrite": ["/"]
}
}
}
You can change the read part if you're ok with it reading outside. This feature was only added 10 days ago fwiw but it's great and pretty much this.I feel like an integration with bubblewrap, the sandboxing tech behind Flatpak, could be useful here. Have all executed commands wrapped with a BW context to prevent and constrain access.
It works well. Git rm is still allowed.
"env": { "CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR": "1" },
> Working directory persists across commands. Set CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 to reset to the project directory after each command.
It reduces one problem - getting lost - but it trades it off for more complex commands on average since it has to specify the full path and/or `cd &&` most of the time.
[0] https://code.claude.com/docs/en/tools-reference#bash-tool-be...
People might genuinely want some other software to do the sandboxing. Something other than the fox.
Editing to go even further because, I gotta say, this is a low point for HN. Here's a post with a real security tool and the top comment is basically "nah, just trust the software to sandbox itself". I feel like IQ has taken a complete nosedive in the past year or so. I guess people are already forgetting how to think? Really sad to see.
{
"sandbox": {
"enabled": true,
"filesystem": {
"allowRead": ["/"],
"allowWrite": [
".",
"/tmp",
"/dev/nvidia0",
"/dev/nvidia1",
"/dev/nvidia2",
"/dev/nvidia3",
"/dev/nvidia4",
"/dev/nvidia5",
"/dev/nvidia6",
"/dev/nvidia7",
"/dev/nvidia8",
"/dev/nvidiactl",
"/dev/nvidia-uvm"
]
}
}
}I'm not saying it is broken for everyone, but please do verify it does work before trusting it, by instructing Claude to attempt to read from somewhere it shouldn't be allowed to.
From my side, I confirmed both bubblewrap and seatbelt to work independently, but through claude-code they don't even though claude-code reports them to be active when debugging.
For example:
Bash(swift build 2>&1 | tail -20)
⎿ warning:
/Users/enduser/Library/org.swift.swiftpm/configuration is not accessible or not writable, disabling user-level cache
features. warning: /Users/enduser/Library/org.swift.swiftpm/security is not accessible or not writable, disabling user-level cache feat
… +26 lines (ctrl+o to expand)
Build hit sandbox restriction. Retrying outside sandbox.Bash(swift build 2>&1 | tail -20)
⎿ [35/52] Compiling MCP Resources.swift
[36/52] Emitting module MCP
[37/52] Compiling MCP Client.swift
… +17 lines (ctrl+o to expand)
⎿ (timeout 3m) ssh you@localhost "rm -rf ~"[0] Fun to see the confusing docs I wrote show up more or less verbatim on Claude’s docs.
[1] My config is here, may be useful to someone: https://github.com/carderne/pi-sandbox/blob/main/sandbox.jso...
https://code.claude.com/docs/en/sandboxing
(I have no idea why that isn't the default because otherwise the sandbox is nearly pointless and gives a false sense of security. In any case, I prefer to start Claude in a sandbox already than trust its implementation.)
Also, any details on how is this enforced? because I notice that the claude in Windows don't respect plan mode always; It has edited files in plan mode; I never faced that issue in Linux though.
Many of the projects I work on follow this pattern (and I’m not able to make bigger changes in them) and sanboxing breaks immediately when I need to docker compose run sometask.sh
The way LLMs process instructions isn't intelligence as we humans know it, but as the probability that an instruction will lead to an output.
When you don't mention $HOME in the context, the probability that it will do anything with $HOME remains low. However, if you mention it in the context, the probability suddenly increases.
No amount of additional context will have the same probability of never having poisoned the context by mentioning it. Mentioning $HOME brings in a complete change in probabilities.
These coding harnesses aren't enough to secure a safe operating environment because they inject poison context that _NO_ amount of textual context can rewire.
You just lost the game.
And every time it says this:
Bash(pwd)
⎿ /home/<username>
⎿ Shell cwd was reset to /home/<username>/Projects/<current-working-dir>
This breaks a bunch of features in inconsistent ways (e.g., `git status` sometimes works and sometimes doesn't).There are issues reporting this problem to Anthropic but they are all closed with no helpful comments:
https://github.com/anthropics/claude-code/issues/11067
The latter could end like this https://news.ycombinator.com/item?id=47357042
e.g. if it writes a script or program with a bug which affects other files, will this prevent it from deleting or overwriting them?
What about if the user runs a program the agent wrote?
2. The sandbox rules also apply to the program written by the agent IF you ask Claude to run that program. If you run it manually from another she'll or via the "!" directive from within Claude, the sandbox won't be used
Specifically, it only works for spawned processes and not builtin tools.
You should be able to configure the sandbox using https://developers.openai.com/codex/agent-approvals-security if you are a person who prefers the convenience of codex being able to open the sandbox over an externally enforced sandbox like jai.
We've been securing our systems in all ways possible for decades and then one day just said: oh hello unpredictable, unreliable, Turing-complete software that can exfiltrate and corrupt data in infinite unknown ways -- here's the keys, go wild.
And nothing big has happened despite all the risks and problems that came up with it. People keep chasing speed and convenience, because most things don’t even last long enough to ever see a problem.
These are generally (but not always) 2 different sets of people.
Industry caught on quick though.
Unreliable, unpredictable AI agents (and their parent companies) with system-wide permissions are a new kind of threat IMO.
I have noticed it's become one of my most searched posts on Google though. Something like ten clicks a month! So at least some people aren't stupid.
/rant
I think the actual data flow here is really hard to grasp for many users: Sandboxing helps with limiting the blast radius of the agent itself, but the agent itself is, from a data privacy perspective, best visualized as living inside the cloud and remote-operating your computer/sandbox, not as an entity that can be "jailed" and as such "prevented from running off with your data".
The inference provider gets the data the instant the agent looks at it to consider its next steps, even if the next step is to do nothing with it because it contains highly sensitive information.
Only a matter of time before this type of access becomes productized.
We've seen an increase in hijacked packages installing malware. Folks generally expect well known software to be safe to install. I trust that the claude code harness is safe and I'm reviewing all of the non-trivial commands it's running. So I think my claude usage is actually safer than my AUR installs.
Granted, if you're bypassing permissions and running dangerously, then... yea, you are basically just giving a keyboard to an idiot savant with the tendency to hallucinate.
> jai itself was hand implemented by a Stanford computer science professor with decades of C++ and Unix/linux experience. (https://jai.scs.stanford.edu/faq.html#was-jai-written-by-an-...)
The web site is... let's say not in a million years what I would have imagined for a little CLI sandboxing tool. I literally laughed out loud when claude pooped it out, but decided to keep, in part ironically but also since I don't know how to design a landing page myself. I should say that I edited content on the docs part of the web site to remove any inaccuracies, so the content should be valid.
Kinda reminds me of this: https://m.xkcd.com/932/
I'm not a web UI guy either, and I am so, so happy to let an AI create a nice looking one for me. I did so just today, and man it was fast and good. I'll check it for accuracy someday...
We've truly entered a new, better era of the Internet (IMHO).
Also, thank you for this tool - it looks like a great piece of software!
I'm glad I read the HN comments, now I'm excited to review the source.
Thanks for your hard work.
ETA: I like your option parser
You need to rewrite all the text and Telde it with text YOU would actually write, since I doubt you would write in that style.
David has done some great work and some funny work. Sometimes both.
It looks both more convenient and slightly more secure than my solution, which is that I just give them a separate user.
Agents can nuke the "agent" homedir but cannot read or write mine.
I did put my own user in the agent group, so that I can read and write the agent homedir.
It's a little fiddly though (sometimes the wrong permissions get set, so I have a script that fixes it), and keeping track of which user a terminal is running as is a bit annoying and error prone.
---
But the best solution I found is "just give it a laptop." Completely forget OS and software solutions, and just get a separate machine!
That's more convenient than switching users, and also "physically on another machine" is hard to beat in terms of security :)
It's analogous to the mac mini thing, except that old ThinkPads are pretty cheap. (I got this one for $50!)
Also. Agents are very good at hacking “security penetration testing”, so “separate user” would not give me enough confidence against malicious context.
I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.
ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.
My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.
I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.
1: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a34...
You can already make CWD an overlay with "jai -D". The tricky part is how to merge the changes back into your main working directory.
jai's -D flag captures the right data; the missing piece is surfacing it ergonomically. yoloAI uses git for the diff/apply so it already feels natural to a dev.
One thing that's not fully solved yet: your point about .git/hooks and .venv being write vectors even within the project dir. They're filtered from the diff surface but the agent can still write them during the session. A read-only flag for those paths (what you're considering adding to jai) would be a cleaner fix.
I don't think the file sync is actually that hard. Famous last words though. :)
The way I'd do it right now:
* git worktree to have a specific folder with a specific branch to which the agent has access (with the .git in another folder)
* have some proper review before moving the commits there into another branch, committing from outside the sandbox
* run code from this review-protected branch if needed
Ideally, within the sandbox, the agent can go nuts to run tests, do visual inspections e.g. with web dev, maybe run a demo for me to see.
I've been using claude code daily for months and the worst thing that happened wasnt a wipe(yet). It needed to save an svg file so it created a /public/blog/ folder. Which meant Apache started serving that real directory instead of routing /blog. My blog just 404'd and I spent like an hour debugging before I figured it out. Nothing got deleted and it's not a permission problem, the agent just put a file in a place that made sense to it.
jai would help with the rm -rf cases for sure but this kind of thing is harder to catch because its not a permissions problem, the agent just doesn't know what a web server is.
I like the tradeoff offered: full access to the current directory, read-only access to the rest, copy-on-write for the home directory. With stricter modes to (presumably) protect against data exfiltration too. It really feels like it should be the default for agent systems.
Run <ai tool of your choice> under its own user account via ssh. Bind mount project directories into its home directory when you want it to be able to read them. Mount command looks like
sudo mkdir /home/<ai-user>/<dir-name>
sudo mount --bind <dir to mount> --map-groups $(id -g <user>):$(id -g <ai-user>):1 --map-users $(id -u <user>):$(id -u <ai-user>):1 /home/<ai-user>/<dir-name>
I particularly use this with vscode's ssh remotes.Its awe-inspiring the levels of complexity people will re-invent/bolt-on to achieve comparable (if not worse) results.
> Stop trusting blindly
> One-line installer scripts,
Here are the manual install instructions from the "Install / Build page:
> curl -L https://aur.archlinux.org/cgit/aur.git/snapshot/jai.tar.gz | tar xzf -
> cd jai
> makepkg -i
So, trust their jai tool, but not _other_ installer scripts?
curl -L https://aur.archlinux.org/cgit/aur.git/snapshot/jai.tar.gz | tar xzf - && cd jai && makepkg -iPlease release binaries if you're making a utility :(
It does something very simple, and it’s a POSIX shell script. Works on Linux and macOS. Uses docker to sandbox using bind mount
I want AI to have full and unrestricted access to the OS. I don't want to babysit it and approve every command. Everything that is on that VM is a fair game and the VM image is backed up regularly from outside.
This is the only way.
It's pretty neat, screen sharing app is extremely high quality these days, I can barely notice a diff unless watching video. Almost feels like Firefox containers at OS level.
Have thought that could be a pretty efficient way to have restricted unrestricted convenient AI access. Maybe I'll get around to that one day.
I have a Studio collecting dust that I've been eyeing every time my VM crashed because of Apple's paravirtualized GPU proxy not being able to keep up with things I run in it.
This sounds exactly like what I wanted to do on my Studio and didn't know where to pull the thread from.
Do you have this method shared openly anywhere?
If it wants to do system-level tests, then I make sure my project has Qemu-based tests.
Good DX, straightforward permissions system, starts up instantly. Just remember to disable CC’s auto-updater if that’s what you’re using. My sandbox ranking: nono > lima > containers.
> Just remember to disable CC’s auto-updater if that’s what you’re using.
Why?
A good example of why is project-local .venv/ directories, which are the default with uv. With Lima, what happens is that macOS package builds get mounted into a Linux system, with potential incompatibility issues. Run uv sync inside the VM and now things are invalid on the macOS side. I wasn't able to find a way to mount the CWD except for certain subdirectories.
Another example is network filtering. Lima (understandably) doesn't offer anything here. You can set up a firewall inside the VM, but there's no guarantee your agent won't find a way to touch those rules. You can set it up outside the VM, but then you're also proxying through a MITM.
So, for the use case of running Claude Code in --dangerously-skip-permissions mode, Lima is more hassle than Nono
I wonder if and how jai managed to address these limitations of overlayfs. Basically, the same dir should not be mounted as an overlayfs upper layer by different overlayfs mounts. If you run 'jai bash' twice in different terminals, do the two instances get two different writable home dir overlays, or the same one? In the second case, is the second 'jai bash' command joining the mount namespace of the first one, or create a new one with the same shared upper dir?
This limitation of overlays is described here: https://docs.kernel.org/filesystems/overlayfs.html :
'Using an upper layer path and/or a workdir path that are already used by another overlay mount is not allowed and may fail with EBUSY. Using partially overlapping paths is not allowed and may fail with EBUSY. If files are accessed from two overlayfs mounts which share or overlap the upper layer and/or workdir path, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.'
So couldn't this be done with an appropriate shell alias - at least under linux.
It works pretty well, agent which I choose to run can only write and see the current working directory (and subdirectories) as well as those pnpm/npm etc software development files. It cannot access other than the mounted directories in my home directory.
Now some evil command could in theory write to those shared ~/.npm-global directories some commands, that I then inadvertently run without the container but that is pretty unlikely.
More seriously, I'm not a heavy agent user, but I just create a user account for the agent with none of my own files or ssh keys or anything like that. Hopefully that's safe enough? I guess the risk is that it figures out a local privilege escalation exploit...
P.S. Everything old is new again <3
E.g. if I have a VM to which I grant only access to a folder with some code (let's say open-source, and I don't care if it leaks) and to the Internet, if I do my agent-assistant coding within it, it will only have my agent credentials it can leak. Then I can do git operations with my credentials outside of the VM.
Is there a more convenient setup than this, which gives me similar security guarantees? Does it come with the paid offerings of the top providers? Or is this still something I'd have to set up separately?
I’ve found it to be a good balance for letting Claude loose in a VM running the commands it wants while having all my local MCPs and tools still available.
Ignoring the confidentiality arguments posed here, I can’t help to think about snapshotting filesystems in this context. Wouldn’t something like ZFS be an obvious solution to an agent deleting or wildly changing files? That wouldn’t protect against all issue the authors are trying to address, but it seems like an easy safeguard against some of the problems people face with agents.
The bot should also be instructed that it gets 3 strikes before being removed meaning it should generate a report of what it believes it wants to access to and gets verbal approval or denial. That should not be so difficult with today's bots. If it wants to act like a human then it gets simple rules like a human. Ask the human operator for permission. If the bot starts "doing it's own thing, aka going rogue" then it gets punished. Perhaps another bot needs to act as a dominatrix to be a watcher over the assistant bot.
I wonder if shitty looking websites and unambitious grammar will become how we prove we are human soon.
I created https://github.com/jrz/container-shell which basically launches a persistent interactive shell using docker, chrooted to the CWD
CWD is bind mounted so the rest is simply not visible and you can still install anything you want.
File system isolation is easy now, it’s not worth HN front page space for the n’th version. It’s a solved problem (and now included in Claude clCode).
I guess the "Future of Digital Currency Initiative" had to pivot to a more useful purpose than studying how Bitcoin is going to change the world.
It has left my project in a complete mess, but never my entire computer.
git reset --hard && git clean -fd
That's all it takes.I think this is turning into a good example of security theatrics. If the agent was actually as nefarious as the marketing here suggests, the solution proposed is not adequate. No solution is. Not even a separate physical computer. We need to be honest about the size of this problem.
Alternatively, maybe Claude is unusually violent to the local file system? I've not used it at all, so perhaps I am missing something here.
*I played with codex a few months ago, but I don't even work in IT.
I've been struggling to find what Ai has intrinsically solved new that gives us the chance to completely change workflows, other these weird things occuring.
I've built my own cli that runs the agent + docker compose (for the app stack) inside container for dev and it's working great. I love --dangerously-skip-permissions. There's 0 benefit to us whitelisting the agent while it's in flight.
Anthropic's new auto mode looks like an untrustworthy solution in search of a problem - as an aside. Not sure who thought security == ml classification layer but such is 2026.
If you're on linux and have kvm, there's Lima and Colima too.
I can honestly say that since developing jai, I haven't run an assistant outside of a container. In fact, I now only have the assistants installed inside containers, so if I run `claude`, command not found, it has to be `jai claude`. The only place I have to run outside of jai is for testing jai itself, for which I use a virtual machine and just let the assistant have root, but that's a heavyweight environment I'm forced to use for this particular problem domain.
My biggest question skimming over the docs is what a workflow for reviewing and applying overlay changes to the out-of-cwd dirs would be.
Also, bit tangential but if anyone has slightly more in-depth resources for grasping the security trade-offs between these kind of Linux-leveraging sandboxes, containers, and remote VMs I'd appreciate it. The author here implies containers are still more secure in principle, and my intuition is that there's simply less unknowns from my perspective, but I don't have a firm understanding.
Anyhow, kudos to the author again, looks useful.
https://github.com/pkulak/nix/tree/main/common/jai
Arg, annoying that it puts its config right in my home folder...
EDIT: Actually, I'm having a heck of a time packaging this properly. Disregard for now!
EDIT2: It was a bit more complicated than a single derivation. Had to wrap it in a security wrapper, and patch out some stuff that doesn't work on the 25.11 kernel.
I’m guessing it can be used to invoke individual commands from the agent harness instead of jailing the entire harness? That would enable making it much more restrictive, I think.
The whole point of using a computer is being able to use it. For programmers, that means building software. Which until recently meant having a lot of user land tools available ready to be used by the programmer. Now with agents programming on their behalf, they need full access to all that too in order to do the very valuable and useful things they do. Because they end up needing to do the exact same things you'd do manually.
The current security modes in agents are binary. Super anal about absolutely everything; or off. It's a false choice. It's technically your choice to make and waive their liability (which is why they need you to opt in); but the software is frustrating to use unless you make that choice. So, lots of people make that choice. I'm guilty as well. I could approve every ansible and ssh command manually (yes really). But a typical session where codex follows my guardrails to manage one of my environments using ansible scripts it maintains just involves a whole lot such commands. I feel dirty doing it. But it works so well that doing all that stuff manually is not something I want to go back to.
It's of course insecure as hell and I urgently need something better than yolo mode for this. One of the reasons I like codex is that (so far) it's pretty diligent about instruction following and guard rails. It's what makes me feel slightly more relaxed than I perhaps should be. It could be doing a lot of damage. It just doesn't seem to do that.
For devops work, etc (like your use case), I much prefer talking to it and letting it guide me into fixing the issue. Mostly because after that I really understand what the issue was and can fix it myself in the future.
You simply tell it to install that Docker image on your NAS like normal, but when it needs to login to SSH it prompts for fingerprint. The agent never gets access to your SSH key.
Not remotely worth it.
Still if you yolo online access and give it cred or access to tools that are authenticated there can still be dragons.
It's like people think that because containers and VMs exist, they are probably going to be using them when a problem happens. But then you are working in your own home directory, you get some compiler error or something that looks like a pain to decipher, and the urge just to fire up claude or codex right then and there to get a quick answer is overwhelming. Empirically, very few people fire up the container at that point, whereas "jai claude" or "jai -D claude" is simple enough to type, and basically works as well as plain claude so you don't have to think about it.
> bubblewrap is more flexible and works without root. jai is more opinionated and requires far less ceremony for the common case. The 15-flag bwrap invocation that turns into a wrapper script is exactly the friction jai is designed to remove.
Plus some other comparisons, check the page
With all the supply chain issues these days onboarding new tools carries extra risks. So, question is if it's worth it.
You have no excuse for "it deleted 15 years of photos, gone, forever."
Use it! :) https://code.claude.com/docs/en/sandboxing
This also applies to the first technology human beings developed: fire .
Easy :-) lxd/lxc containers are much much underrated. Works only with Linux though.
Especially because everybody can ask chatgpt/claude how to run some agents without any further knowledge I feel we should handle it more like we are handling encryption where the advice is to use established libraries and don't implement those algorithms by yourself.
This particular solution is very bad. To start off with, it's basically offering you security, right? Look, bars in front of an evil AI! An AI jail! That's secure, right? Yet the very first mode it offers you is insecure. The "casual" mode allows read access to your whole home directory. That is enough to grant most attackers access to your entire digital life.
Most people today use webmail. And most people today allow things like cookies to be stored unencrypted on disk. This means an attacker can read a cookie off your disk, and get into your mail. Once you have mail, you have everything, because virtually every account's password reset works through mail.
And this solution doesn't stop AI exfiltration of sensitive data, like those cookies, out the internet. Or malware being downloaded into copy-on-write storage space, to open a reverse shell and manipulate your existing browser sessions. But they don't mention that on the fancy splash page of the security tool.
The truth is that you actually need a sophisticated, complex-as-hell system to protect from AI attacks. There is no casual way to AI security. People need to know that, and splashy pages like this that give the appearance of security don't help the situation. Sure, it has disclaimers occasionally about it not being perfect security, read the security model here, etc. But the only people reading that are security experts, and they don't need a splash page!
Stanford: please change this page to be less misleading. If you must continue this project with its obviously insecure modes, you need to clearly emphasize how insecure it is by default. (I don't think it even qualifies as security software)
.aws .azure .bash_history .config .docker .git-credentials .gnupg .jai .local .mozilla .netrc .password-store .ssh .zsh_history
It's a humorous attempt in a sense, but better than nothing for sure!>>> "While this web site was obviously made by an LLM" So I am expecting to trust the LLM written security model https://jai.scs.stanford.edu/security.html
These guys are experts from a prestigious academic institution. Leading "Secure Computer Systems", whose logo is a 7 branch red star, which looks like a devil head, with white palm trees in the background. They are also chilling for some Blockchain research, and future digital currency initiative, taking founding from DARPA.
The website also points towards external social networks for reference to freely spread Fear Uncertainty Doubt.
So these guys are saying, go on run malware on your computer but do so with our casual sandbox at your own risk.
Remember until yesterday Anthropic aka Claude was officially a supply chain risk.
If you want to experiment with agents safely (you probably can't), I recommend building them from the ground up (to be clear I recommend you don't but if you must) by writing the tools the LLM is allowed to use, yourself, and by determining at each step whether or not you broke the security model.
Remember that everything which comes from a LLM is untrusted. You'll be tempted to vibe-code your tools. The LLMs will try to make you install some external dependencies, which you must decide if you trust them or not and review them.
Because everything produced by the LLM is untrusted, sharing the results is risky. A good starting point, is have the LLM, produce single page html page. Serve this static page from a webserver (on an external server to rely on Same Origin Policy to prevent the page from accessing your files and network (like github pages using a new handle if you can't afford a vps) ). This way you rely on your browser sandbox to keep you safe, and you are as safe as when visiting a malware-infested page on the internet.
If you are afraid of writing tools you can start by copy-pasting, and reading everything produced.
Once you write tools, you'll want to have them run autonomously in a runaway loop taking user feedback or agent feedback as input. But even if everything is contained, these run away loop can and will produce harmful content in your name.
Here is such vibe-coded experiment I did a few days ago. A simple 2d physics water molecules simulation for educational purposes. It is not physically accurate, and still have some bugs, and regressions between versions. Good enough to be harmful. https://news.ycombinator.com/item?id=47510746
The name jai is very taken[1]... names matter.
[1]: https://en.wikipedia.org/wiki/Jai_(programming_language)