Sandboxes are a good measure for things like Claude Code or Amp. I use a bubblewrap wrapper to make sure it can't read $HOME or access my ssh keys. And even there, you have to make sure you don't give the bot write access to files you'll be executing outside the sandbox.
It wouldn't be inherently. Is this something that Docker does? Or perhaps something that was done by the code that was run? (Shouldn't it have stayed within that container?)
But also, if it's not okay for the agent to know the API key permanently, why is it okay for the agent to have one-off use of something that requires the same key? Did it actually craft a Bash command line with the API key set and request to run it; or was it just using a tool that ends up with that command?
You can vibe-code a standalone repository, but any sort of serious work with real people working alongside bots, every last PR has to be reviewed, moderated, curated, etc.
Everything AI does that's not specifically intended to be a standalone, separate project requires that sort of intervention.
The safe way to do this is having a sandboxed test environment, high level visibility and a way to quickly and effectively review queued up actions, and then push those to a production environment. You need the interstitial buffer and a way of reverting back to the last known working state, and to keep the bot from having any control over what gets pushed to production.
Giving them realtime access to production is a recipe for disaster, whether it's your personal computer or a set of accounts built specifically for them or whatever, without your human in the loop buffer bad things will happen.
A lot of that can be automated, so you can operate confidently with high level summaries. If you can run a competent local AI and develop strict processes for review and summaries and so forth, kind of a defense in depth approach for agents, you can still get a lot out of ClawBot. It takes work and care.
Hopefully frameworks for these things start developing all of the safety security and procedure scaffolding we need, because OpenClaw and AI bots have gone viral. I'm getting all sorts of questions about how to set them up by completely non-technical people that would have trouble installing a sound system. Very cool to see, I'm excited for it, but there will definitely be some disasters this year.
s/Even/Especially , I would think. Everyone's idea of how to get any decent performance out of an LLM for coding, entails allowing the code to be run automatically. Nominally so that the LLM can see the results and iterate towards a user-provided goal; but it's still untrusted code.
Sandboxes will be left in 2026. We don't need to reinvent isolated environments; not even the main issue with OpenClaw - literally go deploy it in a VM* on any cloud and you've achieved all same benefits. We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc
——-
Unfortuently it’s been a pretty bad week for alignment optimists (meta lead fail, Google award show fail, anthropic safety pledge). Otherwise… Cybersecurity LinkedIn is all shuffling the same “prevent rm -rf” narrative, researchers are doing the LLM as a guard focus but this is operationally not great & theoretically redundant+susceptible to same issues.
The strongest solution right now is human in the loop - and we should be enhancing the UX and capabilities here. This can extend to eventual intelligent delegation and authorization.
[1] https://news.ycombinator.com/threads?id=ramoz&next=47006445
* VM is just an example. I personally have it running on a local Mac Mini & docker sandbox (obviously aware that this isnt a perfect security measure, but I couldnt install on my laptop which has sensitive work access).
Isn’t this the whole point of the Claw experiment? They gave the LLMs permission to send emails on their behalf.
LLMs can not be responsibility-bearing structures, because they are impossible to actually hold accountable. The responsibility must fall through to the user because there is no other sentient entity to absorb it.
The email was supposed to be sent because the user created it on purpose (via a very convoluted process but one they kicked off intentionally).
Where we can verify the lineage of the user's intent originally captured and validated throughout the execution process - eventually used as an authorization mechanism.
Google has a good thought model around this for payments (see verifiable mandates): https://cloud.google.com/blog/products/ai-machine-learning/a...
Me too, at [1].
We need fine-grained permissions at online services, especially ones that handle money. It's going to be tough. An agent which can buy stuff has to have some constraints on the buy side, because the agent itself can't be trusted. The human constraints don't work - they're not afraid of being fired and you can't prosecute them for theft.
In the B2B environment, it's a budgeting problem. People who can spend money have a budget, an approval limit, and a list of approved vendors. That can probably be made to work. In the consumer environment, few people have enough of a detailed budget, with spending categories, to make that work.
Next upcoming business area: marketing to LLMs to get them to buy stuff.
At the same time, let's not let the perfect be the enemy of good.
If you're piloting an aircraft, yeah, you should have perfection.
But if you're sending 34 e-mails and 7 hours of phone calls back and forth to fight a $5500 medical bill that insurance was supposed to pay for, I'd love for an AI bot to represent me. I'd absolutely LOVE for the AI bot to create so much piles of paperwork for these evil medical organizations so that they learn that I will fight, I'm hard to deal with, and pay for my stuff as they're supposed to. Threaten lawyers, file complaints with the state medical board, everything needs to be done. Create a mountain of paperwork for them until they pay that $5500. The next time maybe they'll pay to begin with.
An AI bot can’t be held accountable, so isn’t able to be a responsibility-absorbing entity. The responsibility automatically falls through to the person running it.
Can I get some links / context on this please
meta lead fail: https://techcrunch.com/2026/02/23/a-meta-ai-security-researc...
Goog: https://deadline.com/2026/02/google-apologizes-bafta-news-al... *
Ant: https://time.com/7380854/exclusive-anthropic-drops-flagship-...
* There is now a clarification in the press saying it was not ai-generated.
Alignment as a solution to all of this has a rough long road ahead is my point.
Sure, but now you're adding extra cost, vs just running it locally. RAM is also heavily inflated thanks to Sam Altman investment magic.
1. Don't let it send emails from your personal account, only let it draft email and share the link with you.
2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.
3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.
4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.
Last but last least, use proper VMs like Kata Containers and Firecrackers.
If you let OpenClaw access the daemon, sure it could still get prompt injected to add a bunch of things to your cart, but if the daemon is properly segmented from the OpenClaw user, you should be pretty safe from getting prompt injected to purchase something.
I guess agent isn't the best term here since the LLM wouldn't be driving the logic in the daemon. Using an LLM to select which item to add to the cart would mimic the behavior of full agentic loop without the risk of it going off the rails and completing the purchase.
I can't say this loudly enough, "an LLM with untrusted input produces untrusted output (especially tool calls)." Tracking sources of untrusted input with LLMs will be much harder than traditional [SQL] injection. Read the logs of something exposed to a malicious user and you're toast.
"Find emails that are okay to delete, and check with me before deleting them" can easily turn into "okay deleting all your emails", as so many examples posted online are showing.
I have found this myself with coding agents. I can put "don't auto commit any changes" in the readme, in model instructions files, at the start of every prompt, but as soon as the context window gets large enough the directive will be forgotten, and there's a high chance the agent will push the commit without my explicit permission.
Put an openclaw like thing in your environment, and it’ll paperclip your business-critical database without any malicious intent involved.
(I hope people don't do that, but I expect they probably do.)
How about the corporate vice president of Microsoft Word?
https://www.omarknows.ai/p/meet-lobster-my-personal-ai-assis...
https://www.linkedin.com/in/omarshahine
It’s not going to be amusing when he gets hacked. Zero sense of responsibility.
I have no sympathy for that!!
People have been warned over and over to don't grant full access to these AI and yet, they do the completely opposite.
>Similarly, you shouldn't give OpenClaw access to money. But I want an agent that takes photos of my pantry, sees what I'm running low on, and orders new groceries for me, and that requires my credit card
It should never have access to your main account in the first place anyway.
Have an AI account with limited money in it and even that, have a process in place that will only process any financial request if and only if you have approved it.
The same logic must be followed for everything, people prefer to just give full access without guardrails and hope nothing bad will happen.
To me, virtualization is just a very crude version of capabilities. I thought we'd have collectively realized our mistake by now, and have actually secure, and actually useful, general purpose computing solved.
Now we're on the edge of AGI, not super-intelligence, but something competent, as long as it doesn't hallucinate, or get confused. This is exactly the thing that could have been handled if we weren't on the worst timeline possible. Most of the solutions presented in the article are capabilities based.
Perhaps this will finally get us on the right track, but I doubt it. I'll see if I can use all this AI magic to cough up some reasonable tools fit for purpose, but I'm just one old guy who gets tired far too quickly these days.
[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
That said, we (exe.dev) have a couple more things planned on the VM side that we think agents need that no cloud provider is currently providing. Just don't call it a sandbox.
I'm sure we will get them but only for use with in-house agents, i.e. GMail and Google Pay will get agentic capabilities but they'll only work with Gemini, and only Siri will be able to access your Apple cloud stuff without handing over access to everything, and if you want your grocery shopping handled for you, Rufus is there.
Maybe you will be able to link Copilot to Gemini for an extra $2.99 a month.
The first two are common. With transaction approval the agent would operate on shadow pages / files and any writes would batch in a transaction pending owner approval.
For example, sending emails would batch up drafts and the owner would have to trigger the approval flow to send. Modifying files would copy on write and the owner would approve the overwrite. Updating social activity would queue the posts and the owner would approve the publish.
it's about the same amount of work as implementing undo or a tlog , it's not too complex and given that AI agents are 10000 faster than humans, the big companies should have this ready in a few days.
The problem with scoped roles and PAM is that no reasonable user can know the future and be smart about managing scoped access. But everyone is capable of reading a list of things to do and signing off on them.
I’m assuming the claw might eventually be compromised. If that happens, the damage is limited: they could steal the GLM coding API key (which has a fixed monthly cost, so no risk of huge bills), spam the endpoints (which are rate-limited), or access a Telegram bot I use specifically for this project
Of course OpenClaw is not secure, but to be honest I believe most of the 'stories' where the it went wild are just made up. Especially the crypto one.
It's like everyone seeing the comic book ad and wanting to mail-order an alligator. "It's fine. We can keep it in the bathtub—away from the kids and pets."
Am I the only one that finds this mind bogglingly dumb?
By the way, was that that movie a boy plays a game with an A.I. and the same A.I. starts a thermonuclear war or something like that? I think I watched the start when I was a kid but never really finished it.
I've got my popcorn ready.
And if you don’t connect it to stuff, it can’t connect.
You give it its own accounts, say email and calendar, and have it send you drafts and invite you to stuff. It doesn’t need your email and calendar.
Actually, I just asked my guy and he suggests just generating local ICS files. Even safer.
Checkmate atheists
Sandboxing alone isn’t the right approach… a multi-faceted approach is what works.
What we’ve found that does work is automation on the approval process but only with very strong guards in place… approval fatigue is another growing problem - users simply clicking approve on all requests.
Everything is done locally via our grith cli tool.
Happy to answer any questions on hello@grith.ai too