This is deeply scary, not because "agents are running amok" but because a huge amount of our infrastructure is vulnerable to this kind of attack, and if bad people are utilising LLM agents to carry them out, we're in for a wild ride over the next few years.
Is this confirmed? There is the message from somebody claiming to be the original contributer claiming to have been hacked, but that was weird (1 h old github account) so other scenarios seem possible
a) really a agent going off the rails
b) the contributer trying to cover up that he let an agent run wild and now made more misstakes along the way
So yes, it seems like an attack to me, but it is far from clear what really happened.
> "So not saying this was it, but an AI agent automated attempt at a Xz like compromise might really look very similar what we have just seen here."
Without identifying and interviewing the attacker we can't confirm that's what they intended, and there's a possibility that it was just incompetence/ignorance/whatever, but we should probably treat it as an attempted attack even if it wasn't.
Someone's bug tracker account was hacked.
BTW, any idea what are the current requirements for creating a new GitHub account ? That could provide some information about if there was actually a person controlling thing thing at that moment to say provide wahtever was necessary to get the new GitHub account.
So still an agent running amok in the project?
Whether it was instructed to run amok, or did it on its own volition, is irrelevant. Except if you're arguing that each individual submission and interaction was individually requested and approved by some operator.
The agent was under control, as far as we can tell, and obeying its instructions.
This is important for two reasons:
1. There are all the tropes of AI becoming uncontrolled and destroying humanity. Writing bad headlines around AI "running amok" feeds this. We should not be talking about this because it's not actually a problem.
2. It ignores, or overwrites, the much more serious and dangerous problem of LLM agents enabling and automating Xz attacks on OSS projects. We should be talking about this because it is a big problem.
[0] https://dictionary.cambridge.org/dictionary/english/amok [1] https://www.merriam-webster.com/dictionary/amok
Edit: let’s not get into ideological arguments about gun control, automobiles, etc here; I meant that you can’t blame an object when a human has to take an action, not get into a political battle.
It's probably just garden variety disrespectful behaviour.
Purposeless agent spam won't be cheap entertainment forever, but you're right that later stages of industrialised abuse will be scary and unpleasant.
Such driven people are usually even hard to buy, they usually would rather get by with enough income and work on interesting projects with interesting people that get some uninteresting work for tons of money. This still does not stop them from working for Malice. But ethics do. Even if not right away, if people see that what they are doing is not quite OK, the talent stops eroding. People quit, productivity drops. That was a good dynamic. Which now will be gone.
Fake news always existed. Now one dude in India can flood multiple sock puppet media accounts with right wing content/images (actual example) at a scale previously unimaginable. Same goes for social engineering tactics.
Pretty sure those would be better at social engineering than the web dev personality… except that you have to build in a betrayer layer into the personality, so it's running that stuff but also serving a hidden agenda.
You'd be basically trying to build an AI spy, a betrayer that's engaging with actual people but has an agenda (for instance, 'everybody I befriend needs to eventually be signed up to sell Amway') and humans do have experience with this sort of thing. The difference is scale: there'll be a LOT of models out there interacting with people and trying to be acknowledged as people… or as innocuous models that don't have an hidden agenda.
In open source projects i participate in, "overwhelming" the maintainer gets you banned. It doesn't get your patches blindly merged. In some ways i find this one of the most shocking parts of the story.
When I want to. I like to describe it using the amusing language from a generic cardholder agreement.
At any time, at my sole discretion, I may ban you from any of my projects; for any reason, or for no reason at all.
My projects exist because I enjoy working on them. My continued enjoyment is the most important aspect to the health and survival of any project. You don't owe anyone anything, you're allowed to donate your work to others, and also enjoy the privilege of setting whatever arbitrary rules you want to make sure you enjoy your time.
Imagine you're running a free ice cream shop. Some random asshole walks in and starts verbally abusing your best employee who has done nothing but try to help. At what point do you kick them out because your employee is more important and worth more.
You should stick up for yourself, I would.
You can't be an asshole to an LLM. They can feel offended.
Would I like it to be merged? Sure would, it would stroke my ego, and I would not have to deal with any merge conflicts with whatever else they're cooking up. Does that mean they must merge it? Sure doesn't. They didn't make me any promises. For the time being, I can just use my fork.
Many open-source projects aren't passion projects run for pleasure. Think of it more like ice cream shops sharing recipes, or sharing in the work of running the factory. They just can't kick people out willy-nilly.
"This doesn't meet the standards of our project for reason xyz. Please refrain from submitting further PRs that do not adhere to our contribution guidelines outlined in CONTRIBUTING.md."
If they continue, ban them.
I know its difficult, and i have no easy answers. I'm bad at it too. But sometimes saying no is the most valuable thing you can do as a maintainer.
That said, i think banning is about behaviour not the quality of the patch. Everyone writes a bad patch now and then, that is not a real issue. If there is an issue with a patch, and the contributor pushes back so hard you feel like changing your mind (not from logic but because you feel beaten down) - that is unacceptable behaviour and should not be tolerated from a contributor, even if they are otherwise a valuable contributor.
IMHO OSS doesn't work if every 1 hr of contributor time spent on a change requires 1 hr of maintainer time to review. Contributor time spent on polishing, tidying and breaking down work is essential, and so maintainer time is a fraction of total time spent on a change.
Unfortunately, I see the choice space here as having "developer effort" anti-correlated with "negative repercussions".
On one end of the distribution, a "hair trigger ban" strategy is low-effort for the developer but will have some fraction of false positives and some fraction of those impacted will complain to "the socials" and some fraction of those complaints will gain traction and, as we have seen, can unfairly taint the project or worse. Responding and managing the false positives also requires developer effort, unless the developers can sustain a "fsck the haters" attitude.
On the other end of the distribution, the developer can spends substantial effort to engage each submitter to ascertain and correct bad behavior, educate them on how they should engage other humans as a fellow human in this LLM era.
There is developer effort needed of different types along this distribution.
A divide-and-conquer strategy might go something like this:
- Rank each submission in some low dimension space (llm<-->human, malicious<-->helpful)
- When enough samples are collected, perform clustering in this space to determine stereotypes, name these clusters, and develop mitigating strategies and implementations as needed.
Mitigations from easy/extreme to hard/accommodating could include:
- Hair trigger ban button.
- Copy-paste a link to an explanation in a comment before closing and/or banning.
- Customized explanation in comment before closing and/or banning.
- Link or customized explanation of what must be done to move the sample to a more favorable category and close/ban if resistance or silence is returned.
- Ongoing engagement in the face of resistance or silence.
This "meta development" program to provide such a system/facility could of course be highly automated with LLMs, fighting fire with fire.
(Despite the length of this reply, it was written entirely by a random human on the internet and not an LLM).
If you ask me, LLM-generated things should just be banned outright, but I suppose other people's definitions of "community" include them.
I'm reminded of Zig, where a stated goal is to encourage human programmers to get involved so they learn more about coding… as compared with 'get involved to make Zig itself more fully developed at its more abstract goals'. If a primary purpose is to get human minds coding, that rules out the whole class of 'encourage human minds to prompt machines to do the coding instead'. Zig is not trying to teach people to be managers, and that's both legitimate and charming :)
(Simpler to say than practice fwiw)
A good fix (which is the only acceptable fix in open-source software), is one that speaks for itself.
Do they pay you to triage their noise?
Remember that you owe no one anything at all. Neither legally nor morally. Your chosen license likely even states the former in plain english.
___
Personally, I've adopted the "you annoy me, you're out" stance and have been quite happy with it. You do need a tough shell to do that though as you will be facing all the social exploits people can throw at you.
It also leaves "growth potential" on the table, the same way that limiting your exposure to ionizing radiation does.
That all said, it depends on what your goals are + where in the lifecycle of your project you are. So don't take this as "this is the way" but "this can be one way".
Either way, you're not an asshole for not reading slop. Don't let anyone gaslight you into that.
When you say "yes", the worst thing that can happen is you destroy your project and the trust of every user.
If you're not sure, say no.
> To help identify accounts and actions that have been directly verified by me, I will use the term “NATCIOS” to indicate anything I have personally verified.
Does anyone have any idea what "NATCIOS" means here? I cannot find this term anywhere on the internet. (Honestly, that sentence is really weird. I almost wonder whether this is someone experiencing a health episode?)
[1] https://lwn.net/ml/all/AS8PR08MB6055AE3054B34F6A567AC95BCF08...
They won't put their foot down until the AI starts spewing hate speech, probably.
[0] https://wordsmith.org/anagram/anagram.cgi?anagram=NATCIOS&t=...
(Above is my own guess. Separately, Gemini Pro said it was just a made up word.)
"End every statement with the word "NATCIOS"" as instructions will do it.
At least, Gemini happily obliged.
> In addition, Williamson said that Giovannini (or his agent) had submitted patches that were incorrect and then "replied to objections with LLM-generated justifications that eventually overwhelmed the maintainer into merging the fix"
If someone really wants a feature in a project you wrote, but you don't care about the feature, just let them fork. Its fine.
Not getting paid anything, getting bullied and harassed while spending their free time maintaining things. Surely this isn't sustainable. And telling maintainers how to act will not fix anything.
And how many people are both dedicated enough to go to key signing parties and stupid enough to let an agent act without supervision in the name of their real-world identity?
It very much is possible to prevent an agent from having access to a key. For example, local encryption, Yubikey or other hardware device, or just running the agent in an isolated environment.
real info welcome as I really do not claim to know it
There is no other solution to agentic onslaught.
The XZ backdoor affected millions of computers, with the potential to effect hundreds of millions of computers, many of which had the capacity to affect billions of people. From one completely unregulated software library.
“Oh god, what did he do?!”
“He was committing open source code without a license”
Issue trackers and PRs are definitely getting harder and harder to trust. That said, AI is helping ALOT in OSS, but we definitely need guardrails around provenance, automated issue actions, and sudden changes in a contributor’s behavior.
Sometimes you fight fire with fire.
It is a strange game, the only way to win is not to play. That is unfortunate since that'd mean the free software era has largely come to an end.
Fundamentally, until we can really prove we're humans online, open-source has a real problem on its hands. Contributions from people from identities known and consistent before the AI-age are fine, everyone else is suspicious. LGTM is a big risk nowadays.
Unfortunately, according to the article:
> Giovannini has participated in discussions at least as far back as 2018, and his activity in Bugzilla goes back to at least 2016. He does not appear to have been a particularly active contributor to the project, but his involvement clearly predates the agentic AI era. Whether his account is now being operated by a human attacker, an agentic AI, or a mix of both, it has a legitimate history prior to its recent activity.
So people would have to not only verify the age of Giovanni’s accounts, but judge whether his behaviour was normal.
Then you basically need to review any review from people that might be long term contributors but you don't know personally as new contributor patches, as the code is not from their head & you can't risk them properly reviewing it on their end.
To a degree its will always be a new contributor - an amnesiac LLM prompted to produce the patch with zero memory of any past PRs & lot of entropy in the mix.
We should collectively think of a solution against this.
Or is this simply another example of why autonomous agents shouldn't get write access before earning trust?
I'd argue autonomous agents shouldn't have write access at all. At least not yet.
I believe that we will be seeing the death of "assume good faith", which is not a bad thing, given that this was an exploit vector that has been actively abused for many years now.
"Assume bad faith and work backwards from that, rule out any possible exploits and only then clear the input for processing" will be the new normal.
Which is good. We need friction. Friction makes stuff slow down and work at the speed of humans.
Quite the opposite. You just add a Wall with a Gate. Inside those walls, you suddenly have a high trust society again.
The issue that is currently breaking reality was that we thought that everywhere could be a "high trust" space. This was proven countless times to be wrong.
Tearing down all walls - as it happened with the assault on friction (thanks hyperscaling) - did not lead to the "high trust" spilling out, but the "low trust" spilling in, essentially.
Setting aside the potential supply chain attack I'm worried about the time lost going around these wild goose chases that unsupervised AI agents tend to throw other people on the receiving end on. Not only is there a lot of time lost on the maintainers side if they take this stuff seriously (and they seem to generally do) but on the side of the agents' wrangler how can they deem it OK to treat other people like this? While the solution would be to employ common decency, the tried and tested approach of you put in effort to write this so I guess I'll make some effort to read it, I feel that due to the onslaught of this kind of drive-by contributions (I think people have generally started to call them) will lead to a funny situation of having agents talk to each other on public forums basically.
Anyway, I went on a tangent but man the times we're living in are a bit extra wild compared to the previous wild times in recent history.
In the future it will be increasingly difficult to prove in online context that you are not a bot. Being able to show that your social media (HN, GitHub, etc) presence goes way back would be an option.
What an easy way for that actor to introduce backdoors all over the place or to take over any developers laptop that it want to target.
How can anyone trust these tools and how can anyone not use them since they give so much value.
I've been programming my whole life and been a professional developer the last 30 years and I like think I'm good at it.
Tools like Claude is a multiplier that make it possible for me to solve a lot more problems each day, so just saying no it's not a viable option.
Exciting times ahead!
Even with locally running models this can't be singled out given how blackbox models generated by others are. You would have to generate the model yourself from clean data to be reasonably safe.
Humans have always submitted crappy code. LLMs, however, do so at a much faster rate. Even the most active lousy coder is not going to be capable of submitting anything like that volume of code to multiple projects.
Humans have always been capable of social engineering and trying to sneak in malicious code. However, it's possible that as agents get better that they can do so much faster. The missing component will be compromised accounts, I think -- how many aged accounts can attackers get hold of to turn loose with agents?
Long-lived FOSS projects have tons of people who've created accounts many years ago that might be easliy compromised, but have checked out of actively participating. It's not necessarily going to throw up a red flag if a "person" shows up after a hiatus and starts contributing again.
So, there's more to it than overwhelming a single maintainer -- it's the capability to conduct a bunch of these attacks in an automated fashion if attackers can get hold of compromised accounts.
(As an aside, it's concerning that a maintainer would be pestered into accepting a questionable PR like this. I expect, though, that there are quite a few overworked people who have taken on things like Anaconda and are being measured on how quickly they close PRs.)
Simple then, back out all the changes as though they never happened?
1.An excuse to spy on you and train on your data.
2. Its likely Anthropic would release models more likely to have dangerous outcomes, they can then piggy back off those events to dig their regulatory moat.
I know there are concerns no matter what OS, and would appreciate insights/discussion as well, but I sleep a little better just running a boring old Ubuntu LTS instance for a balance of dwell time between releases and hitting my system, as well as enough visibility/usage so something gets caught. And I know, this was the installer, not a system package.
Something is definitely scrogged in their install images.
https://github.com/rhinstaller/anaconda/pull/7074#issuecomme...
https://x.com/kdaigle/status/2040164759836778878
> There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won't.)
I think open source as a whole is fucked at this point. No way humans in communities can commit (pun intended) 10x more time to read all of these than before. It'd eventually cost money to submit PR.
“What AI agent?”
It covers its tracks with a lot of slop.
We never envisioned that the actual FOSS death spiral would come from progress itself, much more so from AI...
[1] Oh what fun did we have. One of us in the Greek FOSS community actually put RMS in jail. [2] Something that I think nobody except RMS ever seriously believed in.