1) claude will stash (despite clear instructions never to do so).
2) claude will use sed to bulk replace (despite clear instructions never to do so). sed replacements make a mess and replaces far too many files.
3) claude restores the stash. Finds a lot of conflicts. Nothing runs.
4) claude decides it can't fix the problem and does a reset hard.
I have this right at the top of my CLAUDE.md and it makes things better, but unlike codex, claude doesn't follow it to the letter. However, it has become a lot better now.
NEVER USE sed TO BULK REPLACE.
*NEVER USE FORCE PUSH OR DESTRUCTIVE GIT OPERATIONS*: `git push --force`, `git push --force-with-lease`, `git reset --hard`, `git clean -fd`, or any other destructive git operations are ABSOLUTELY FORBIDDEN. Use `git revert` to undo changes instead.
How can people be so naive as to run something like Claude anywhere other than in a strictly locked down sandbox that has no access to anything but the single git repo they are working on (and certainly no creds to push code)?
This is absolutely insane behavior that you would give Claude access to your GitHub creds. What happens when it sees a prompt injection attack somewhere and exfiltrates all of your creds or wipes out all of your repos?
I can't believe how far people have fallen for this "AI" mania. You are giving a stochastic model that is easily misdirected the keys to all of your productive work.
I can understand the appeal to a degree, that it can seem to do useful work sometimes.
But even so, you can't trust it with anything, not running it in a locked down container that has no access to anything but a Git repo which has all important history stored elsewhere seems crazy.
Shouting harder and harder at the statistical model might give you a higher probability of avoiding the bad behavior, but no guarantee; actually lock down your random text generator properly if you want to avoid it causing you problems.
And of course, given that you've seen how hard it is to get it follow these instructions properly, you are reviewing every line of output code thoroughly, right? Because you can't trust that either.
This is only restricted for *fully free* accounts, but this feature only requires a minimum of a paid Pro account. That starts around $4 USD/month, which sounds worth it to prevent lost work from a runaway tool.
In your own example you have all this huge emphasis on the negatives, and then the positive is a tiny un-emphasized afterthought.
$ yoloai new bugfix . -a --network-isolated --agent claude
Now I have a claude code session that only has a COPY of my work dir, and can't reach anything over the network except the Claude API server.Now I interact with the agent, and when it's done:
$ yoloai diff bugfix
diff --git a/b64.go b/b64.go
index cfc5549..253c919 100644
--- a/b64.go
+++ b/b64.go
@@ -39,7 +39,7 @@ func Encode(data []byte) string {
val |= uint(data[i+2])
}
- out[j] = alphabet[(val>>18)&0x3E]
+ out[j] = alphabet[(val>>18)&0x3F]
out[j+1] = alphabet[(val>>12)&0x3F]
remaining := n - i
Looks good, let's apply it: $ yoloai apply bugfix
Target: /home/ks/tmp/b64
Commits to apply (1):
9db260b33bcd Fix bit mask in base64 encoding
Apply to /home/ks/tmp/b64? [y/N] y
1 commit(s) applied to /home/ks/tmp/b64
Now the commit claude made inside the sandbox has been applied to my workdir: $ git log
commit 5b0fc3a237efe8bbc9a9e1a05f9ce45d37d38bfa (HEAD -> main)
Author: Karl Stenerud <kstenerud@gmail.com>
Date: Mon Mar 30 05:28:21 2026 +0000
Fix bit mask in base64 encoding
Corrected the bit mask for the first character extraction from 0x3E to 0x3F to properly extract all 6 bits.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
commit 31e12b62b0c3179f3399521d7c4326a8f6130721 (tag: init)
The important thing here is that Claude was not able to reach anything on the network except its own API, and nothing it did ever touched my work dir until I was happy with the changes and applied them.It also doesn't get access to my credentials, so it couldn't push even if it did have network access.
Time for a personal Forgejo instance? Mine has been running great for more than a year. Faster than GitHub even.
If you tell AI not to do something, you make it incomprehensibly more likely it will happen.
Use affirming language. Why do you think negative prompts don't exist in diffusion anymore?
Its trivial to setup and you could literally ask claude to do it for you and never have any of these issues ever again.
Any and all "I don't want it to ever run this command" issues are just skill issues.
Never had that experience in the whole time using cursor at work so I had to "take the agent to task" and ask it "WTF-mate? you'd better be able to repro that!" and then circle around the drain for a while getting an AGENTS.md written up. Not really a big deal, as the whole project was like 1k lines in and it's not like the code I'd hand-written there was "irreplaceable" but it lead to some interesting discussion w/ the AI like "Why should I have to tell you this? Shouldn't your baseline training data presume not to delete files that you didn't author? How do you think this affects my trust not just of this agent session, but all agent interactions in the future?"
Overall, this is turning out to be quite interesting technology times we're living in.
You can reduce the risk, but not drive it to zero, and at scale even very small failure rates will surface.
Just setup a hook that prevents any git commands you don't ever want it to run and you will never have this happen again.
Whenever I see stuff like this I just wonder if any of these people were ever engineers before AI, because the entire point of software engineering for decades was to make processes as deterministic and repeatable as possible.
[1] Not strictly a hyphen, which has its own unicode point (0x2010) outside of ascii. Unicode embraced the ambiguity by calling this point (0x2d) "HYPHEN-MINUS" formally, but really its only unique typographic usage is to represent subtraction.
(Or... do they?? Hmm, ok, maybe I need to let this roll around in my mind.)
comments: "ThE tItLe iS aI cOded !!!1"
triple hyphens —
Double hyphens —
Triple hyphens —-
Actual em dash (typed with more effort, but HN changes it) —
The triple hyphens has a gap in it separating the autocorrected en dash and the hyphen.
Now I wish I could reject `git reset --hard` on my local system somehow.
Care about the data in that workspace? Push it first.
Othwerwise it is a cat and mouse game of whackamole.
I don’t think this is a valid way of checking for spawned processes. Git commands are fast. 0.1-second intervals are not enough. I would replace the git on the $PATH by a wrapper that logs all operations and then execs the real git.
Maybe even submitting the bug report "agentically" without user input, if it's running on host without guardrails (pure speculation).
E: It's a runaway bot lol https://github.com/anthropics/claude-code/issues/40701#issue...
(No need to use bpftrace, just an easy example :-) )
> Update: Root cause found — this was a bug in a tool I built that was running locally for testing, not Claude Code.
https://github.com/anthropics/claude-code/issues/40710#issue...
Some people are upset at my brave new world characterization, but yeah even as someone deriving value from Claude Code we've jumped the shark on AI in development.
Either the industry will face that reality and recalibrate, or in 20 years we're going to look back on these days like the golden age of software reliability and just accept that software is significantly more broken than it was (we've been priming ourselves for that after all)
It's tending more and more towards pushing the user to treat the whole thing as a pure chat interface magic black box, instead of a rich dashboard that allows you to keep precise track of what's going on and giving you affordances to intervene. So less a tool view and more magic agent, where the user is not supposed to even think about what the thing is even doing. Just trust the process. If you want to know what it did, just ask it. If you want to know if it deleted all the files, just ask it in the chat. Or don't. Caring about files is old school. Just care about the chat messages it sends you.
You reap what you sow, finance bro.
I just checked, mine also doesn‘t.
"Update: Root cause found — this was a bug in a tool I built that was running locally for testing, not Claude Code.
When the tool's configuration pointed at a local working directory, it would hard-reset that directory every poll cycle to reflect the remote — destroying all uncommitted changes to tracked files, exactly as described in the issue."
Flagged the submission as it's inaccurate. Will unflag if title gets changed to something like "dev builds script that resets their git repo every 10 minutes, forgets about it, blames claude code with no evidence"
Most likely, the developer ran `/loop 10m <prompt>` or asked claude to create a cron task that runs every 10 minutes and refreshes & resets git.
“Sync with the server periodically to get the latest”
Tracks for what we can infer
If anybody has suggestions for how to do this with LLMs (short of maintaining CLAUDE_wall_of_shame.md), please share.
Edit: for the record, yes I do run a linter, and generally try not to impose bikeshedding or soapboxes on my peers. It's just that there are certain patterns that I personally am not going to commit under my own username as the engineer of record.
Edit 2: I saw another comment recommending "Always confirm with me before doing $x" (and then always denying). Seems like it might work.
The model is probabilistic and sequences like `git reset --hard` are very common in training data, so they have some probability to appear in outputs.
Whether such a command is appropriate depends on context that is not fully observable to the system, like whether a repository or changes are disposable or not. Because of that, the system cannot rely purely on fixed rules and has to figure intent from incomplete information, which is also probabilistic.
With so many layers of probabilities, it seems expected that sometimes commands like this will be produced even if they are not appropriate in that specific situation.
Even a 0.01% failure rate due to context corruption, misinterpretation of intent, or guardrail errors would show up regularly at scale, that is like 1 in 10000 queries.
> I guess, what I'm trying to say ... is this even a bug? Sounds like the model is doing exactly what it is designed to do.
False, it goes against the RL/HF and other post training goals.
That's not what I said at all. I never said it will be produced. I said there is some probability of it being produced.
> False, it goes against the RL/HF and other post training goals.
It is correct that frequency in training data alone does not determine outputs, and that post-training (RLHF, policies, etc.) is meant to steer the model away from undesirable behavior.
But those mechanisms do not make such outputs impossible. They just make them less likely. The underlying system is still probabilistic and operating with incomplete context.
I am not sure how you can be so confident that a probabilistic model would never produce `git reset --hard`. There is nothing inherent in how LLMs work that makes that sequence impossible to generate.
And if you force push to one of your own machines you can use the reflog[2].
[0]: https://stackoverflow.com/a/78872853 [1]: https://stackoverflow.com/a/48110879 [2]: https://stackoverflow.com/a/24236065
I know because I use keepassxc as my secret provider, so I get an approval prompt to allow or deny it _every_time_, so I darn well notice it as it'll grab focus when typing something. Inside a sandbox it tries but without access to creds or env, it just silently fails, so I never noticed while in-sandbox, just when out-of-sandbox, which I do occasionally to let it do some sysadmin/housecleaning task for me.
I finally asked Claude why: "Root cause: It's Claude Code's built-in git context feature, not hooks.
So unless you keep your secrets open to the user regardless who's asking for them, it'll nag you to escape out of menus with a gh query. KeepassXC doesn't let you set a per-session limit, it's either right now for forever.
Everytime a mistake has happened,on diggin in I was always trace it back to something which I did wrong - either being careless in reading what it told me , or careless in telling what I want. I have had git code corruption issues, it overwrote uncommited working code with non working code. But it was my mistake to not tell it to commit the code before makign changes. It deleted QA cluster database but becuase I told it to delete it thinking it was my dev setup db. Net net. It;s mistakes are more a reflection of me as its supervisor than anything else.
These things happen. They happened before coding agents, they happen now. I've done plenty of damage with my own ten fingers on the keyboard without any help from an LLM.
This is exactly why I develop on a Mac with Time Machine. It has saved my bacon many times. Both from things I did and from things Claude did. I've had several recent incidents that went like this:
"me: Claude, did you delete X?" "claude: Yes, sorry, I shouldn't have done that. I can reconstruct it." [Narrator: no, claude cannot reconstruct it.] "me: Should I just restore it from Time Machine?" "claude: Yes! That's perfect!"
I swear I can feel a sense of relief from Claude when I tell it I can just restore from backup.
/10 * * * /usr/ schedules script execution
do not share a workspace with the llm, or with anybody for that matter.
How would the llm even distinguish what was wrote by them and what was written by you ?
This whole LLM thing is a blast, huh?
-
I guess some people are upset at my brave new world characterization, but even as someone deriving value from Claude Code we've jumped the shark on AI in development.
The idea a natural request can get Claude to invoke potentially destructive actions on a timer is silly
https://code.claude.com/docs/en/scheduled-tasks#set-a-one-ti...
What would it cost if the /loop command was required instead of optional?
Oh, but maybe allowing it to do remote git operations is a necessary trigger.