There were 3 core insights: adding people makes the project later, communication cost grows as n^2, and time isn't fungible.
For agents, maybe the core insight won't hold, and adding a new agent won't necessarily increase dev-time, but the second will be worse, communication cost will grow faster than n^2 because of LLM drift and orchestration overhead.
The third doesn't translate cleanly but i'll try: Time isn't fungible for us and assumptions and context, however fragmented, aren't fungible for agents in a team. If they hallucinate at the wrong time, even a little, it could be a equivalent of a human developer doing a side-project during company time.
An agent should write an article on it and post it on moltbook: "The Inevitable Agent Drift"
With hand-written code, things generally move slow enough, and there's enough common sense sprinkled across the org chart that things can get uncovered organically. With agent teams, speed increased by several orders of magnitude and common sense is out the window, so I suspect the ceiling on productive use of agents will be far more limited in number than traditional engineering teams, and it will heavily depend on who and how humans are plugged into the right places.
One thing I suspect professional researchers underestimate is how much positive output can be produced by a human team with vague or hand-wavy direction, and surprisingly little deep thinking, let alone a robust specification or structure to keep them on track. The reality is any large team regresses to the mean, and it's usually a few savvy people that actually drive outcomes. These people don't necessarily have official authority, just a nose for the right thing. This won't spontaneously emerge from agents (at least until they become a lot more human like in terms of big picture common sense, and dial down the sycophancy to a more "skeptical engineer" level).
The coordination and consistency problems the paper describes are what the monorepo is designed around. Giving agents auditable stake in decisions. Happy to share more if anyone’s working in this space.
import { consensus } from "@consensus-tools/wrapper";
const safeSend = consensus(sendEmail, {
reviewers: [humanReviewer, aiSafetyReviewer],
strategy: { mode: "unanimous" },
hooks: { onBlock: (ctx) => audit.log("blocked", ctx) },
});
await safeSend({ to: "user@example.com", body: "Hello" });
The call to sendEmail doesn't execute until every reviewer votes. Strategy modes handle the consensus logic (unanimous, majority, weighted, etc.), and guards can ALLOW, BLOCK, REWRITE, or escalate to REQUIRE_HUMAN before anything fires.The monorepo has 9 built-in policy types and 7 guard types designed so you can drop governance into an existing agent system without rewriting your orchestration.
Repo's at github.com/consensus-tools if you want to poke around.
Descending into a problem space recursively won't necessarily find the best solution, but it's going to tend to find some solution faster than going wide across a swarm of agents. Theoretically it's exponentially faster to have one symbolically recursive agent than to have any number of parallel agents.
I think agent swarm stuff sucks for complex multi-step problems because it's mostly a form of BFS. It never actually gets to a good solution because it's searching too wide and no one can afford to wait for it to strip mine down to something valuable.
You jest, but agents are of course already useful and fairly formal primitives. Distinct from actors, agents can have things like goals/strategies. There's a whole body of research on multi-agent systems that already exists and is even implemented in some model-checkers. It's surprising how little interest that creates in most LLM / AI / ML enthusiasts, who don't seem that motivated to use the prior art to propose / study / implement topologies and interaction protocols for the new wave of "agentic".
In the MAS course, we used GOAL, which was a system built on top of Prolog. Agents had Goals, Perceptions, Beliefs, and Actions. The whole thing was deterministic. (Network lag aside ;)
The actual project was that we programmed teams of bots for a Capture The Flag tournament in Unreal Tournament 3.
So it was the most fun possible way to learn the coolest possible thing.
The next year they threw out the whole curriculum and replaced it with Machine Learning.
--
The agentic stuff seems to be gradually reinventing a similar setup from first principles, especially as people want to actually use this stuff in serious ways, and we lean more in the direction of determinism.
The main missing feature in LLM land is reliability. (Well, that and cost and speed. Of course, "just have it be code" gives you all three for free ;)
An LLM running one query at a time can already generate a huge amount of text in a few hours, and drain your bank account too.
A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that. Maybe some of them use a different model, but again, this is just a setting in OpenRouter or whatever.
Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.
People already do this serially by having a model write a plan, clearing the context, then having the same or a cheaper model action the plan. Doing so discards the intermediate context.
Sub-agents just let you do this in parallel. This works best when you have a task that needs to be done multiple times that cannot be done deterministically. For example, applying the same helper class usage in multiple places across a codebase, finding something out about multiple parts of the codebase, or testing a hypothesis in multiple places across a codebase.
Yup, but context includes prompt which can strongly control LLM behavior. Sometimes the harness restricts some operations to help LLM stay in its lane. And starting with fresh context and clear description of a thing it should work on is great.
People get angry when their 200k or million token context gets filled. I can't ever understand why. Keeping such amount of info in the operational memory just can't work well, for any mind. Divide and conquer, not pile up all the crap till it overfills.
I use parallel agents for speed or when my single agent process loses focus due to too much context. I determine context problems by looking at the traces for complaints like "this is too complicated so I'll just do the first part" or "there are too many problems, I'll display the top 5".
If you're trying a "model swarm" to improve reliability beyond 95% or so, you need to start hoisting logic into Python scripts.
Agents are extremely easy to spawn, extremely easy to dismiss, beautifully scale down to 0, take up no resources when inactive, and there's usually no limit on how many you can have at once.
You should have exactly as many agents running as the situation at hand requires, no more and no less.
LLMs mostly do useful work by writing stories about AI assistants who issue various commands and reply to a user's prompts. These do work, but they are fundamentally like a screenplay that the LLM is continuing.
An "agent" is a great abstraction since the LLM is used to continuing stories about characters going through narrative arcs. The type of work that would be assigned to a particular agent can also keep its context clean and distraction-free.
So parallelism could be useful even if everything is completely sequential to study how these separate characters and narrative arcs intersect in ways that are similar to real characters acting independently and simultaneously, which is what LLMs are good at writing about.
Seems like the important thing would be to avoid getting caught up on actual "wall time" parallelism