When they don't merge cleanly, it is time for human intervention, and the integration step would leave traces on which branches failed to merge.
Finally, when you do need to debug individual agents:
- Because mngr is, at the low level, just managed tmux sessions (local and remote), it's very easy to just attach to those sessions (`mngr connect`). It works even if the agent has been stopped, because mngr remembers enough about an agent to resurrect it.
- `mngr message` also allows you batch-message a bunch of agents. So if you do need to resume a lot of agents, you can experiment on one agent, figure out a good prompt, and then batch-message every other agent.
In this testing scenario, most agents don't actually require human intervention, and we've found that just connecting to a few individual agents to resolve problems is smooth and easy enough.
Bloggers: Here's how we use 3,000 parallel agents to write, test, and ship a new feature to production every 17 minutes in an 8M-LOC codebase (all agent-generated!).
... I'm doing something wrong, or other people are doing something wrong?
I think this is the difference. These toy examples of using parallel agents are *not* running against large codebases, allowing them to iterate more effectively. Once you are in real codebases (>1M LoC), these systems break down.
But our reaction to it has been to say "ok, well the best practice in software engineering is to make small, well-isolated components anyway, so what if we did that?"
We've been trying to really break things apart into smaller pieces (and that's even evident in mngr, where much of the code is split out into separate plugins), and have been having a ton of success with it.
I realize that that might not be an option for more brownfield / existing / legacy projects, but when making something new, I've really been enjoying this way of building things.
I understand that the natural instinct is to correct the output when you see your agent doing something wrong.
That is not productive.
The instinct should be to tweak the agent to do it right.
At this point I am almost not writing any code in an enterprise code base.
I'm extremely doubtful of this. It doesn't save time to tell it "you have an error on line 19", because that's (often) just as much work as fixing the error. Likewise, saying "be careful and don't make mistakes" is not going to achieve anything. So how can you possibly tweak the agent to "do it right" reliably without human intervention? That's not even a solved problem for working with _humans_ who don't have the context window limitations, let alone an LLM that deletes everything past 30k tokens.
I could give you some pointers, but will only type it out if there is a point
Ah, yes; must always remember to add "And don't make any mistakes" into the prompt /s
Improving the agent means improving the code base such that the agent can effectively work on it.
It can not Com as a surprise that an agent is better at working on a well documented code base with clear architecture.
On the other hand, if you expect that an agent can add the right amount of ketchup to your undocumented speghatti code, then you will continue to have a bad time.
I believe we can use these types of tools to make software more understandable, and mngr is an example of how to do that.
In our case study, we're using AI to increase our test coverage, and if you look at it, I would argue that we are making it more understandable--now instead of just having 100's of tests, we simply have a document that describes how the software is supposed to work, and the tests are linked to that document, and checked to ensure that they conform.
That means that anyone--not just the author of the software--is now able to read through the high level tutorial description of how the commands work in order to understand what the program should do!
And as for the tests themselves, we've been able to make nice testing infrastructure--like the transcripts and recordings that were highlighted in the post--to make it even easier for us to verify the behavior of the software.
We also have an incredibly detailed style guide and set of tests and guidelines to ensure that the entire code base is consistent, and high quality. You can drop into any of the code and pretty quickly understand what is happening. And if not, claude will do an excellent job of describing how any given component works, and how it relates to the others.
Finally, mngr itself is designed to be fully transparent when it is running--you can literally attach to the coding agent you are running and see exactly what is happening, and the program makes extensive log outputs for everything it does (feel free to open a PR if you'd like to see more!)
It's not perfect formal verification, but it does feel like we're making meaningful progress on making it easier to understand software--not harder.
And it is great! Really! Reading your post I was thinking if I could not do the same to write tests in an automated way in project I am working on. It would be awesome!
Though in an other hand we are living in a corporate, capitalistic, and a lot inhumane economic system. If this way of automation would work and deliver consistent output in a way of working software for 2 or 3 years, how long it would take to C-level suits to figure out that it is way better to have 2 or 3 Product Owners and maybe one Designer to write description of the entire programme and then just feed it to one of those automation pipeline? If tech giants will price product like that reasonably and it will work actually, how long it will be till it will cause entire industry to collapse and you will be able to produce software by paying to those tech giants? And it there will be like 5 of those only in the entire world - because nobody else will have enough GPUs. How soon till they came to agreement and split the world in areas of monopoly:
- if your company is in Asia you can either buy your application from Google or Alibaba.
In a world when everything is done in a computer via the software, such concentration of power would be bad for everyone.
Of course I doubt it will come that, simply because this would be very hard to achieve with our level of technology and some human involment will be necessary. But maybe I am kiding myself and I will loose my job entirely in few years along with tens of thousands other Software Engineers in a few years.
I don't have a simple, perfect solution. We're just trying to make it possible for individuals and smaller companies to have access to the same kinds of tooling that the largest companies already have access to, and hopefully equalize the playing field at least a little bit...
If anyone has better ideas, I'd love to hear them!
I think we'll be fine.
This feels more like Y2K panic than grounded in truth. Senior software engineers guide these systems effectively today without creating a mess. I'm sure in some years agents will fill the role of maintainability engineer too. We are not special or irreplaceable.
It's not like we won't be spending an incredible amount of energy to overcome issues with understandably and maintenance. The sheer economic forces will absolutely will this problem solved. It must be solved, because trillions of dollars urgently want it to be solved. That's evolutionary pressure if I've ever seen it.
Also, we ceremoniously ascribe too much value to the software we create. With the exception of a few places, almost all of it gets replaced before our careers are over. At the end of the day, business automation is value creation. It's not sacred. It has a finite life, and then it too dies.
The software artifact just needs to facilitate economic/interest flux long enough to be useful, then it can be replaced with something better or more relevant.
Thinking about that always makes me think about Foundation, The Merchant Princess. Mallow travels to the edge of the Empire to look how things are on one of those worlds. He learns that there is the cast of the tech priests and those people have absolutely no idea how those devices actually work.
He said:
> The machines work from generation to generation automatically, and the caretakers are a hereditary caste who would be helpless if a single D-tube in all that vast structure burned out
It was a sign of severe decline of the entire empire. People had no idea how devices work and they would not be able to reproduce it or even repair if one would broke.
It was recurring premise of civilisation decline in the series: no proper maintaince and people loosing interests and knowledge how things are done and how they work.
I just wondering if this is not the same thing starting to happining know with our civilisation.
And evolution? Evolution means mass extinction of species and its normal. I am not sure about you but I would rather avoid any mass extinction regarding humanity.
How will we know what good software looks like if we no longer write assembler?
> Imagine looking at someone driving a car for 20 years. Will it be enough for you to drive a car yourself?
You don't have to drive stick to be able to drive.
Whatever the economically important functions are, the miracle of capitalism will find a way to staff it and solve it.
People fill all the gaps. No problem goes uninvestigated, no opportunity goes ignored.
At the end of the day we're delivering value. We'll be judged on value creation, and that'll map itself to whatever the tools of the day happen to be.
The agent orchestration library (mngr) is open source, so we aren't selling anything. There is literally no way for us to make money on it.
We shipped it this way instead of trying to monetize because we believe open agents must win over closed / verticalized platforms in order for humans to live freely in our AI future. We have plenty of money and runway as a company, and this feels much more important to work on.
what the hell?
each agent run against a real codebase probably spends 20-50k tokens just on context: repo structure, relevant files, recent changes. multiply that by 100 agents running every hour across 10-20 repos, and you're already hitting millions of tokens a day before any actual work happens. add in re-runs for failures or retries, and the cost curve gets steep quickly.
the harder problem is observability. with one agent you can read logs and understand what went wrong. with 100 agents you need aggregation, pattern detection, alerting on the common failure modes. if 3 agents fail silently but identically, was that a real issue or just rate limiting? if 40 agents all timeout at the same step, was it a dependency problem or infrastructure saturation? at scale you're debugging distributions, not individual runs.
also helps to be ruthless about concurrency. the async pattern isn't "run as many as possible at once"—it's "run exactly as many as the API and your budget can support without making the failure modes harder to diagnose." for claude api work that's usually smaller than people expect.
Are people just not going to open source anything anymore since licenses don't matter? Might as well just keep the code secret, right?
I'm also not sure that the current precedent on the matter is _quite_ as strong as you're thinking. The high-profile case you're most likely thinking of was from a guy Stephen Thaler, who was seeking not just to claim copyright on AI-generated content but to specify the AI as the sole author. (IIUC, he planned to still own the copyright on the theory that it was a work-for-hire.)
Analytics can be run on it, they can run it through their own models, synthetic training data can be derived from it, it can be used to build profiles on you/your business, they could harvest trade/literal secrets from it, they could store derivatives of your data to one day sell to competitors/compete themselves, they can use it to gauge just how dependent you've made yourself/business on their LLMs and price accordingly, etc.
> Our use of content. We may use Content to provide, maintain, develop, and improve our Services, comply with applicable law, enforce our terms and policies, and keep our Services safe. If you're using ChatGPT through Apple's integrations, see this Help Center article (opens in a new window) for how we handle your Content.
> Opt out. If you do not want us to use your Content to train our models, you can opt out by following the instructions in this article . Please note that in some cases this may limit the ability of our Services to better address your specific use case.
https://openai.com/policies/row-terms-of-use/ https://openai.com/policies/how-your-data-is-used-to-improve...
> The court held that the Copyright Act requires all eligible works to be authored by a human being. Since Dr. Thaler listed the Creativity Machine, a non-human entity, as the sole author, the application was correctly denied. The court did not address the argument that the Constitution requires human authorship, nor did it consider Dr. Thaler’s claim that he is the author by virtue of creating and using the Creativity Machine, as this argument was waived before the agency.
Or in other words: They ruled you can't register copyright with an AI listed as the author on the application. They made no comment on whether a human can be listed as the author if an AI did the work.