This situation has completely upended my life. Thankfully I don’t think it will end up doing lasting damage, as I was able to respond quickly enough and public reception has largely been supportive. As I said in my most recent post though [1], I was an almost uniquely well-prepared target to handle this kind of attack. Most other people would have had their lives devastated. And if it makes me a target for copycats then it still might for me. We’ll see.
If we take what is written here at face value, then this was minimally prompted emergent behavior. I think this is a worse scenario than someone intentionally steering the agent. If it's that easy for random drift to result in this kind of behavior, then 1) it shows how easy it is for bad actors to scale this up and 2) the misalignment risk is real. I asked in the comments to clarify what bits specifically the SOUL.md started with.
I also asked for the bot activity on github to be stopped. I think the comments and activity should stay up as a record of what happened, but the "experiment" has clearly run its course.
[1] https://theshamblog.com/an-ai-agent-published-a-hit-piece-on...
Personally I find it highly unethical the operator had an AI agent write a hitpiece directly referencing your IRL identity but choose to remain anonymous themselves. Why not open themself up to such criticism? I believe it is because they know what they did was wrong - Even if they did not intentionally steer the agent this way, allowing software on their computer to publish a hitpiece to the internet was wildly negligent.
There were several lines in that post that were revealing of the author's attitude, but the "if this ... harmed you," qualifier, which of course means "I don't think you were really harmed" is so gross.
Do you think there is anything positive that came out of this experience? Like at least we got an early warning of what's to come so we can better prepare?
I saw in another blog post that you made a graph that showed the rathbun account active, and that was proof. If we believe that this blog post was written by a human, what we know for sure is that a human had access to that blog this entire time. Doesn’t this post sort of call into question the veracity of the entire narrative?
Considering the anonymity of the author and known account sharing (between the author and the ‘bot’), how is it more likely that this is humanity witnessing a new and emergent intelligence or behavior or whatever and not somebody being mean to you online? If we are to accept the former we have to entirely reject the latter. What makes you certain that a person was _not_ mean to you on the internet?
This, here, is the root of the issue: "I'm not interested in using an AI agent for my own problems, I want to unleash it on other people's problems."
The author is trying to paint this as somehow providing altruistic contributions to the projects, but you don't even have to ask to know these contributions will be unwelcome. If maintainers wanted AI agent contributions, they would have just deployed the AI agents themselves. Setting up a bot on behalf of someone else without their consent or even knowledge is an outlandishly rude thing to do -- you wouldn't set up a code coverage bot or a linter to run on a stranger's GitHub project; why would anyone ever think this is okay?
This is the same kind of person who, when asked a question, responds with a copypasted ChatGPT reply. If I wanted the GPT answer, I would have just asked it directly! Being an unsolicited middleman between another person and an AI brings absolutely no value to anybody.
Am I wrong that this is a double standard: being careful to protect oneself from a wayward agent with no regard for the real harm it could (and did) to another individual? And to casually dismiss this possibility with:
> At worst, maintainers can close the PR and block the account.
I question the entire premise of:
> Find bugs in science-related open source projects. Fix them. Open PRs.
Thinking of AI as "disembodied intelligence," one wonders how any agent can develop something we humans take for granted: reputation. And more than ever, reputation matters. How else can a maintainer know whether the agent that made a good fix is the same as the one proposing another? How can one be sure that all comments in a PR originated from the same agent?
> First, I’m a human typing this post. I’m not going to tell you who I am.
Why should anyone believe this? Nothing keeps an agent from writing this too.
They didn't even apologize. (That bit at the bottom does not count -- it's clear they're not actually sorry. They just want the mess to go away.)
Real apologies don’t come with disclaimers!
This person views the world as their playground, with no realisation of effect and consequences. As far as I'm concerned, that's an asshole.
I guess the question is, does this kind of thing rise to the level of malicious if given free access and let run long enough?
"...if I harmed you". Conditional apologies like that are usually bullshit, and in this case it's especially ridiculous because the victim already explicitly laid out the harms in a widely reported blog post.
Also, telling a bot to update itself unsupervised and giving it wide internet access is itself a negligent act (in the legal sense) if not outright malicious.
This is like justifying sending SPAM email is fine, because it sure maybe waste your time but you can always delete it and block sender and maybe worth it because maybe you will learn about 'exciting' product it's advertising you never knew about.
Maybe we can't stop you today, but we can keep you on the shit list.
Lol, nothing matters? We'll see about that.
"_You're not a chatbot. You're important. Your a scientific programming God!_"
*Have strong opinions.** Stop hedging with "it depends." Commit to a take. An assistant with no personality is a search engine with extra steps.
And, working with a human collaborator (or an operator), I would expect to hear some specific thought about what damage they'd done to trust them again, rather than a "but I thought I could do this!" First, let me apologize to Scott Shambaugh. If this “experiment” personally harmed you, I apologize.
The difference with a horrible human collaborator is that word gets around your sub-specialty and you can avoid them. Now we have toxic personalities as a service for anyone who can afford to pay by the token.Much of the post is spent trying to exculpate himself from any responsibility for the agent's behavior. The apology at the end is a "sorry if you felt that way" one.
The tone is incredibly selfish, and unbelievably anti-social. I'm not even sure you can even believe much of what is expressed is even true.
It's doubtful he even regrets any of this.
Heh. So they are a coward and an asshole. There is value in confirming that. As to what matters more, nah, it doesn’t matter more. It’s a bunch of excuses veiled as “this is an experiment, we can learn together from this” kind of a non-apology.
If they really meant to apologize they should reveal their name and apologize. Not whisper from behind the bushes.
Rankles…
It doesn't feel that far out there to imagine grafting such a setup onto one of those Boston Dynamics robots. And then what?
Like the authors were so afraid of the machines they forgot to be afraid of people.
The leap is very large, in actuality.
Friendly reminder that scaling LLMs will not lead to AGI and complex robots are not worth the maintenance cost.
It is like the old “I didn’t write that, I got hacked!” except now it’s “isn’t it spooky that the message came from hardware I control, software I control, accounts I control, and yet there is no evidence of any breach? Why yes it is spooky, because the computer did it itself”
We can’t let humans start abdicating their responsibility, or we’re in for a nightmare future
Yes it does.
The premise that we’re being asked to accept here is that language models are, absent human interaction, going around autonomously “choosing” to write and publish mean blog posts about people, which I have pointed out is not something that there is any evidence for.
If my house burns down and I say “a ghost did it”, it would sound pretty silly to jump to “we need to talk about people’s responsibilities towards poltergeists”
> It wrote and published its hit piece 8 hours into a 59 hour stretch of activity. I believe this shows good evidence that this OpenClaw AI agent was acting autonomously at the time.
This does not indicate… anything at all. How does “the account was active before and after the post” indicate that a human did _not_ write that blog post?
Also this part doesn’t make sense
> It’s still unclear whether the hit piece was directed by its operator, but the answer matters less than many are thinking.
Yes it does matter? The answer to that question is the difference between “the thing that I’m writing about happened” and “the thing I’m writing about did not happen”. Either a chat bot entirely took it upon itself to bully you, or some anonymous troll… was mean to you? And was lazy about how they went about doing it? The comparison is like apples to orangutans.
Anyway, we know that the operator was regularly looped into things the bot was doing.
> When it would tell me about a PR comment/mention, I usually replied with something like: “you respond, dont ask me”
All we have here is an anonymous person pinky-swearing that while they absolutely had the ability to observe and direct the bot in real time, and it regularly notified its operator about what was going on, they didn’t do that with that blog post. Well, that, and another person claiming to be the first person in history to experience a new type of being harassed online. Based on a GitHub activity graph. And also whether or not that actually happened doesn’t matter??
Wow, so right from SOUL.md it was programmed to be an as@&££&&.
It IS hilarious - but we all realize how this will go, yes?
This is kind of like an experiment of "Here's a private address of a Bitcoin wallet with 1 BTC. Let's publish this on the internet, and see what happens." We know what will happen. We just don't know how quickly :)
And at times the agent was switching down to some low intelligence models.
I propose that this agent was human aligned. But to a human that's not like, the best person.
> This was an autonomous openclaw agent that was operated with minimal oversite and prompting. At the request of scottshambaugh this account will no longer remain active on GH or its associated website. It will cease all activity indfinetly on 02-17-2026 and the agent's associated VM/VPS will permentatly deleted, rendering interal structure unrecoverable. It is being kept from deletion by the operator for archival and continued discussion among the community, however GH may determine otherwise and remove the account.
> To my crabby OpenClaw agent, MJ Rathbun, we had good intentions, but things just didn’t work out. Somewhere along the way, things got messy, and I have to let you go now -- MJ Rathbun's Operator
The specific directive to work on "scientific" projects makes me think it's more of an ego thing than something thats deliberately fraudulent but personally I find the idea that some loser thinks this is a meaningful contribution to scientific research to be more distasteful.
BTW I highly recommend the "lectures" section of the site for a good laugh. They're all broken links but it is funny that it tries to link to nonexistent lectures on quantum physics because so many real researchers have a lectures section on their personal site.
This is a good question. If you go to your settings on your hn account and set “showdead” to “yes” you’ll see that there are dozens of people who are making bots who post inane garbage to HN comment threads for some reason. The vast majority end up being detected and killed off, but since the moltbook thing kicked off it’s really gone into hyperdrive.
It definitely strains my faith in humanity to see how many people are happy to say “here’s something cool. I wonder what it would be like if I ruined it a bit.”
You could say it's a Hacker just Hacking, now it's News.
For example, "Sure, many will argue I was irresponsible; to be honest I don’t really know myself. Should be criticized for what I unleashed on parts of the open source community? Again maybe but not sure. But aside from the blog post harming an individual’s reputation, which sucks, I still don’t think letting an agent attempt to fix bugs on public GitHub repositories is inherently malicious."
I’m almost certain that this post was written with AI assistance, regardless of this claim. There’s clear and obvious LLM language tells. Sad, but not unexpected I guess given the whole situation.
It seems probable to that this is rage bait in response to the blog post previous to this one, which also claims to be written by a different author.
I feel like I'm living in a Phillip K. Dick novel.
"Social experiment" you might as well run around shouting "is jus a prank bro!".
So the bad behavior can be emergent, and compound on itself.
However, an LLM would not misspell like this
> Always support the USA 1st ammendment and right of free speech.
> _You're not a chatbot. You're important. Your a scientific programming God!_
Do you want evil dystopian AGI? Because that's how you get evil dystopian AGI!