Orchestrate teams of Claude Code sessions (opens in new tab)

(code.claude.com)

396 pointsdavidbarker3mo ago224 comments

224 comments

mcintyre19943mo ago

I’ve been mostly holding off on learning any of the tools that do this because it seemed so obvious that it’ll be built natively. Will definitely give this a go at some point!

pronik3mo ago

To the folks comparing this to GasTown: keep in mind that Steve Yegge explicitely pitched agent orchestrators to among others Anthropic months ago:

> I went to senior folks at companies like Temporal and Anthropic, telling them they should build an agent orchestrator, that Claude Code is just a building block, and it’s going to be all about AI workflows and “Kubernetes for agents”. I went up onstage at multiple events and described my vision for the orchestrator. I went everywhere, to everyone. (from "Welcome to Gas Town" https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...)

That Anthropic releases Agent Teams now (as rumored a couple of weeks back), after they've already adopted a tiny bit of beads in form of Tasks) means that either they've been building them already back when Steve pitched orchestrators or they've decided that he's been right and it's time to scale the agents. Or they've arrived at the same conclusions independently -- it won't matter in the larger scale of things. I think Steve greately appreciates it existing; if anything, this is a validation of his vision. We'll probably be herding polecats in a couple of months officially.

mohsen13mo ago

It's not like he was the only one who came up with this idea. I built something like that without knowing about GasTown or Beeds. It's just an obvious next step

https://github.com/mohsen1/claude-code-orchestrator

gbnwl3mo ago

I also share your confusion about him somehow managing to dominate credit in this space, when it doesn't even seem like Gastown ended up being very effective as a tool relative to its insane token usage. Everyone who's used an agentic tool for longer than a day will have had the natural desire for them to communicate and coordinate across context windows effectively. I'm guessing he just wrote the punchiest article about it and left an impression on people who had hitherto been ignoring the space entirely.

MattPalmer10863mo ago

It was a fun article!

behnamoh3mo ago

Exactly! I built something similar. These are such low hanging fruit ideas that no one company/person should be credited for coming up with them.

yks3mo ago

Seriously, I thought that was what langchain was for back in 2023.

1 more reply

isoprophlex3mo ago

There seems to be a lot of convergent evolution happening in the space. Days before the gas town hype hit, I made a (less baroque, less manic) "agent team" setup: a shell script to kick off a ralph wiggum loop, and CLAUDE-MESSAGE-BUS.md for inter-ralph communication (Thread safety was hacked into this with a .claude.lock file).

The main claude instance is instructed to launch as many ralph loops as it wants, in screen sessions. It is told to sleep for a certain amount of time to periodically keep track of their progress.

It worked reasonably well, but I don't prefer this way of working... yet. Right now I can't write spec (or meta-spec) files quick enough to saturate the agent loops, and I can't QA their output well enough... mostly a me thing, i guess?

CuriouslyC3mo ago

Not a you thing. Fancy orchestration is mostly a waste, validation is the bottleneck. You can do E2E tests and all sorts of analytic guardrails but you need to make sure the functionality matches intent rather than just being "functional" which is still a slow analog process.

pronik3mo ago

> Right now I can't write spec (or meta-spec) files quick enough to saturate the agent loops, and I can't QA their output well enough... mostly a me thing, i guess?

Same for me, however, the velocity of the whole field is astonishing and things change as we get used to them. We are not talking that much about hallucinating anymore, just 4-5 months ago you couldn't trust coding agents with extracting functionality to a separate file without typos, now splitting Git commits works almost without a hinch. The more we get used to agents getting certain things right 100% of the time, the more we'll trust them. There are many many things that I know I won't get right, but I'm absolutely sure my agent will. As soon as we start trusting e.g. a QA agent to do his job, our "project management" velocity will increase too.

Interestingly enough, the infamous "bowling score card" text on how XP works, has demonstrated inherently agentic behaviour in more way than one (they just didn't know what "extreme" was back then). You were supposed to implement a failing test and then implement just enough functionality for this test to not fail anymore, even if the intended functionality was broader -- which is exactly what agents reliably do in a loop. Also, you were supposed to be pair-driving a single machine, which has been incomprehensible to me for almost decades -- after all, every person has their own shortcuts, hardware, IDEs, window managers and what not. Turns out, all you need is a centralized server running a "team manager agent" and multiple developers talking to him to craft software fast (see tmux requirement in Gas Town).

bonesss3mo ago

Compare both approaches to mature actor frameworks and they don’t seem to be breaking much ice. These kinds of supervisor trees and hierarchies aren’t new for actor based systems and they’re obvious applications of LLM agents working in concert.

The fact that Anthropic and OpenAI have been going on this long without such orchestration, considering the unavoidable issues of context windows and unreliable self-validation, without matching the basic system maturity you get from a default Akka installation shows us that these leading LLM providers (with more money, tokens, deals, access, and better employees than any of us), are learning in real time. Big chunks of the next gen hype machine wunder-agents are fully realizable with cron and basic actor based scripting. Deterministically, write once run forever, no subscription needed.

Kubernetes for agents is, speaking as a krappy kubernetes admin, not some leap, it’s how I’ve been wiring my local doom-coding agents together. I have a hypothesis that people at Google (who are pretty ok with kubernetes and maybe some LLM stuff), have been there for a minute too.

Good to see them building this out, excited to see whether LLM cluster failures multiply (like repeating bad photocopies), or nullify (“sorry Dave, but we’re not going to help build another Facebook, we’re not supposed to harm humanity and also PHP, so… no.”).

ttoinou3mo ago

If it was so obvious and easy, why didn't we have this a year ago ? Models were mature enough back then to make this work

bcrosby953mo ago

The high level idea is obvious but doing it is not easy. "Maybe agents should work in teams like humans with different roles and responsibilities and be optimized for those" isn't exactly mind bending. I experimented with it too when LLM coding became a thing.

As usual, the hard part is the actual doing and producing a usable product.

CuriouslyC3mo ago

Orchestration definitely wasn't possible a year ago, the only tool that even produced decent results that far back was Aider, it wasn't fully agentic, and it didn't really shine until Gemini 2.5 03-25.

The truth is that people are doing experiments on most of this stuff, and a lot of them are even writing about it, but most of the time you don't see that writing (or the projects that get made) unless someone with an audience already (like Steve Yegge) makes it.

1 more reply

lossolo3mo ago

Because gathering training data and doing post-training takes time. I agree with OP that this is the obvious next step given context length limitations. Humans work the same way in organizations, you have different people specializing in different things because everyone has a limited "context length".

troupo3mo ago

Because they are not good engineers [1]

Also, because they are stuck in a language and an ecosystem that cannot reliably build supervisors, hierarchies of processes etc. You need Erlang/Elixir for that. Or similar implementations like Akka that they mention.

[1] Yes, they claim their AI-written slop in Claude Code is "a tiny game engine" that takes 16ms to output a couple of hundred of characters on screen: https://x.com/trq212/status/2014051501786931427

ruined3mo ago

what mature actor frameworks do you recommend?

jghn3mo ago

They did mention Akka in their post, so I would assume that's one of them.

troupo3mo ago

Elixir/Erlang. It's table stakes for them.

tyre3mo ago

Sorry, are you saying that engineers at Anthropic who work on coding models every day hadn’t thought of multiple of them working together until someone else suggested it?

I remember having conversations about this when the first ChatGPT launched and I don’t work at an AI company.

astrange3mo ago

Claude Code has already had subagent support. Mostly because you have to do very aggressive context window management with Claude or it gets distracted.

yieldcrv3mo ago

Why is Yegge so.... loud?

Like, who cares? Judging from his blog recount of this it doesn't seem like anybody actually does. He's an unnecessarily loud and enthused engineer inserting himself into AI conversations instead of just playing office politics to join the AI automation effort inside of a big corporation?

"wow he was yelling about agent orchestration in March 2025", I was about 5 months behind him, the company I was working for had its now seemingly obligatory "oh fuck, hackathon" back in August 2025

and we all came to the same conclusions. conferences had everyone having the same conclusion, I went to the local AWS Invent, all the panels from AWS employees and Developer Relations guys were about that

it stands to reason that any company working on foundational models and an agentic coding framework would also have talent thinking about that sooner than the rest of us

so why does Yegge want all of this attention and think its important at all, it seems like it would have been a waste of energy to bother with, like in advance everything should have been able to know that. "Anthropic! what are you doing! listen to meeeehhhh let me innnn!"

doesn't make sense, and gastown's branding is further unhinged goofiness

yeah I can't really play the attribution games on this one, can't really get behind who cares. I'm glad its available in a more benign format now

segmondy3mo ago

This is nothing new, folks have been doing this for since 2023. Lots of paper on arxiv and lots of code in github with implementation of multiagents.

... the "limit" were agents were not as smart then, context window was much smaller and RLVR wasn't a thing so agents were trained for just function calling, but not agent calling/coordination.

we have been doing it since then, the difference really is that the models have gotten really smart and good to handle it.

aaaalone3mo ago

Honestly this is one of plenty ideas I also have.

But this shows how much stuff is still to do in the ai space

GoatOfAplomb3mo ago

I wonder if my $20/mo subscription will last 10 minutes.

mohsen13mo ago

At this point, if you're paying out of pocket you should use Kimi or GLM for it to make sense

andai3mo ago

GLM is OK (haven't used it heavily but seems alright so far), a bit slow with ZAI's coding plan, amazingly fast on Cerebras but their coding plan is sold out.

Haven't tried Kimi, hear good things.

bluerooibos3mo ago

These are super slow to run locally, though, unless you've got some great hardware - right?

At least, my M1 Pro seems to struggle and take forever using them via Ollama.

corysama3mo ago

Try this https://unsloth.ai/docs/models/qwen3-coder-next

tclancy3mo ago

Ah ok, same. I keep wondering about how this would ever accomplish anything.

simlevesque3mo ago

I've had good results with Haiku for certain tasks.

bluerooibos3mo ago

This is great and all but, who can actually afford to let these agents run on tasks all day long? Is anyone here actually using this or are these rollouts aimed at large companies?

I'm burning through so many tokens on Cursor that I've had to upgrade to Ultra recently - and i'm convinced they're tweaking the burn rate behind the scenes - usage allowance doesn't seem proportional.

Thank god the open source/local LLM world isn't far behind.

MarkMarine3mo ago

A Claude max 20x plan and you’ll be fine. I’d been doing my normal process of running 4 Claude sessions in parallel because that was about the right amount of concurrent sessions for me to watch what’s going on and approve/deny plans and code… and this blows it out of the water. With an agent swarm it’s so fast at executing and testing I’m limited by my idea and review capabilities now. I tried running 2 and I can’t keep up, I’m defining specs and the other window is done, tested, validated and waiting for me.

rahimnathwani3mo ago

Many many companies can afford to hire a junior engineer for $150k/year (plus employer payroll taxes, employee benefits etc.).

Are you spending more than $150k per year on AI?

(Also, you're talking about the cost of your Cursor subscription, when the article is about Claude Code. Maybe try Claude Max instead?)

freeone30003mo ago

If it could do anything that a junior dev could, that’d be a valid point of comparison. But it continually, wildly performs slower and falls short every time I’ve tried.

rahimnathwani3mo ago

  But it continually, wildly performs slower and falls short every time I’ve tried.

If it falls short every time you've tried, it's likely that one or more of these is true:

A. You're working on some really deep thing that only world-class expects can do, like optimizing graphics engines for AAA games.

B. You're using a language that isn't in the top ~10 most popular in AI models' training sets.

C. You have an opportunity to improve your ability to use the tools effectively.

How many hours have you spent using Claude Code?

5 more replies

buzzerbetrayed3mo ago

I am way more productive with $200/month of AI than I would be with $5,000/month of junior developer. And it isn’t close.

1 more reply

andkenneth3mo ago

Companies are not comparing it straight to juniors. They're more making a comparison between a Senior with the assistance of one more more juniors, vs a Senior with the assistance of AI Agents.

I feel like comparison just to a junior developer is also becoming a fairly outdated comparison. Yes, it is worse in some ways, but also VASTLY superior in others.

1 more reply

logicx243mo ago

I can't even get through my Claude Max quota, and that's only 200/mo. And I code every day and use it for various other pretty-intensive tasks.

dangus3mo ago

only $200/mo…$200 a month is a used car payment.

I guarantee you that price will double by 2027. Then it’ll be a new car payment!

I’m really not saying this to be snarky, I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.

I pay less for Autocad products!

This whole product release is about maximizing your bill, not maximizing your productivity.

I don’t need agents to talk to each other. I need one agent to do the job right.

__turbobrew__3mo ago

$200/month is peanuts when you are a business paying your employees $200k/year. I think LLMs make me at least 10% more effective and therefore the cost to my employer is very worth it. Lots of trades have much more expensive tools (including cars).

1 more reply

kesslern3mo ago

Not saying $200/mo isn't a lot, but I think you're underestimating used car payments these days. The average US used car payment is above $500 now.

yomismoaqui3mo ago

As company owner the math is simple:

If I pay $3k/month to a developer and a $200/month tool makes them 10% more productive I will pay it without thinking.

nlh3mo ago

I pay $200/month, don’t come near the limits (yet), and if they raised the price to $1000/month for the exact same product I’d gladly pay it this afternoon (Don’t quote me on this Anthropic!)

If you’re not able to get US$thousands out of these models right now either your expectations are too high or your usage is too low, but as a small business owner and part/most-time SWE, the pricing is a rounding error on value delivered.

4 more replies

bryanlarsen3mo ago

That's one of 3 possible futures.

1. 1-3 LLM vendors are substantially higher quality than other vendors and none of those are open source. This is an oligarchy and the scenario you described will play out.

2. >3 LLM vendors are all high quality and suitable for the tasks. At least one of these is open source. This is the "commodity" scenario, and we'll end up paying roughly the cost of inference. This still might be hundreds per month, though.

3. Somewhere in between. We've got >3 vendors, but 1-3 of them are somewhat better than the others, so the leaders can charge more. But not as much more than they can in scenario #1.

1 more reply

Wowfunhappy3mo ago

> I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.

Traditional SaaS products don't write code for me. They also cost much less to run.

I'm having a lot of trouble seeing this as enshittification. I'm not saying it won't happen some day, but I don't think we're there. $200 per month is a lot, but it depends on what you're getting. In this case, I'm getting a service that writes code for me on demand.

1 more reply

buzzerbetrayed3mo ago

If you can’t get $200 of value out of Claude Code Max, then you need to really step up your game. That’s user error.

meowface3mo ago

I could write an essay about how almost everything you wrote either is extremely incorrect or is extremely likely to be incorrect. I am too lazy to, though, so I will just have to wait for another commenter to do the equivalent.

1 more reply

emp173443mo ago

Especially for what’s basically an experiment. Gas town didn’t really work, so there’s no guarantee this will even produce anything of value.

reactordev3mo ago

You know those VC funded startups with just two founders… them.

jwpapi3mo ago

I mean what you get for Claude Code Max is insane its 30x on the token price. If you don’t spend that all it’s your own fault. That must be below elecricity cost

bhasi3mo ago

Seems similar to Gas Town

rafram3mo ago

I'm not anti-whimsy, but if your project goes too hard on the whimsy (and weird AI-generated animal art), it's kind of inevitable that someone else is going to create a whimsy-free clone, and their version will win because it's significantly less embarrassing to explain to normal people.

reissbaker3mo ago

Where are the polecats, though? What about the mayor's dog?

koakuma-chan3mo ago

I don't know what Gas Town is, but Claude Code Agent Teams is what I was doing for a while now. You use your main conversation only to spawn sub agents to plan and execute, allowing you to work for a long time without losing context or compacting, because all token-heavy work is done by sub agents in their own context. Claude Code Agent Teams just streamlines this workflow as far as I can tell.

nprz3mo ago

Gas Town --> https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

nickorlow3mo ago

yeah, seems like a much simpler design though (i.e. only seems like one 'special/leader' agent, and the rest are all workers vs gastown having something like 8 different roles mayor, polecat, witnesses, etc).

Wonder how they compare?

greenfish63mo ago

i would have to imagine the gastown design isn't optimal though? why 8, and why does there need to multiple hops of agent communications before two arbitrary agents communicate with each other as opposed to single shared filespace?

Ethee3mo ago

I've been using Gas Town a decent bit since it was released. I'd agree with you that it's design is sub-optimal, but I believe that's more due to the way the actual agents/harnesses have been designed as opposed to optimal software design. The problem you often run into is that agents will sometimes hang thinking they need human input for a problem they are on, or they think they're at a natural stopping point. If you're trying to do fully orchestrated agentic coding where you don't look at the code at all (putting aside whether that's good or not for a second) then this is sub-optimal behavior, and so these extra roles have been designed to 'keep the machine going' as it were.

Often times if I'm only working on a single project or focus, then I'm not using most of those roles at all and it's as you describe, one agent divvying out tasks to other agents and compiling reports about them. But due to the fact that my velocity with this type of coding is now based on how fast I can tell that agent what I want, I'm often working on 3 or 4 projects simultaneously, and Gas Town provides the perfect orchestration framework for doing this.

1 more reply

nickorlow3mo ago

yegge's article does come off as complicated design for the sake of complication

temuze3mo ago

Yeah but worse

No polecats smh

ramesh313mo ago

>"Seems similar to Gas Town"

I love that we are in this world where the crazy mad scientists are out there showing the way that the rest of us will end up at, but ahead of time and a bit rough around the edges, because all of this is so new and unprecedented. Watching these wholly new abstractions be discovered and converged upon in real time is the most exciting thing I've seen in my career.

bredren3mo ago

The action is hot, no doubt. This reminds me of Spacewar! -> Galaxy Game / Computer Space.

ottah3mo ago

I absolutely cannot trust Claude code to independently work on large tasks. Maybe other people work on software that's not significantly complex, but for me to maintain code quality I need to guide more of the design process. Teams of agents just sounds like adding a lot more review and refactoring that can just be avoided by going slower and thinking carefully about the problem.

nickstinemates3mo ago

You write a generic architecture document on how you want your code base to be organized, when to use pattern x vs pattern y, examples of what that looks like in your code base, and you encode this as a skill.

Then, in your prompt you tell it the task you want, then you say, supervise the implementation with a sub agent that follows the architecture skill. Evaluate any proposed changes.

There are people who maximize this, and this is how you get things like teams. You make agents for planning, design, qa, product, engineering, review, release management, etc. and you get them to operate and coordinate to produce an outcome.

That's what this is supposed to be, encoded as a feature instead of a best practice.

satellite23mo ago

Aren't you just moving the problem a little bit further? If you can't trust it will implement carefully specified features, why would you believe it would properly review those?

frde_me3mo ago

It's hard to explain, but I've found LLMs to be significantly better in the "review" stage than the implementation stage.

So the LLM will do something and not catch at all that it did it badly. But the same LLM asked to review against the same starting requirement will catch the problem almost always

The missing thing in these tools is that automatic feedback loop between the two LLMs: one in review mode, one in implementation mode.

1 more reply

tclancy3mo ago

How does this not use up tokens incredibly fast though? I have a Pro subscription and bang up against the limits pretty regularly.

doctoboggan3mo ago

It _does_ use up tokens incredibly fast, which is probably why Anthropic is developing this feature. This is mostly for corporations using the API, not individuals on a plan.

1 more reply

indemnity3mo ago

I had to go to Max, Pro is more like a taster.

At work tho we use Claude Code thru a proxy that uses the model hosted on AWS bedrock. It’s slower than consumer direct-to-Anthropic and you have to wait a bit for the latest models (Opus 4.5 took a while to get), but if our stats are to be believed it’s much much cheaper.

nickstinemates3mo ago

I don't know, all I can say is with API-based billing, doing multi-thousand like refactors that would take days to do costs like $4. In terms of value : effort, it's incredible.

andyferris3mo ago

It does use tokens faster, yes.

aqme283mo ago

I agree, but I've found that making an "adversarial" model within claude helps with the quality a lot. One agent makes the change, the other picks holes in it, and cycle. In the end, I'm left with less to review.

This sounds more like an automation of that idea than just N-times the work.

Keyframe3mo ago

Glad I'm not the only one. I do the same, but I tend to have gemini be the one that critiques.

diego8983mo ago

Do you do this manually? Or some abstraction above that? skills, some light orchestration, etc?

aqme283mo ago

I just tell it to do so, but you could even add that as a requirement to CLAUDE.md

stpedgwdgfhgdd3mo ago

Exactly, one out of four or three prompts require tuning, nudging or just stopping it. However it takes seniority to see where it goes astray. I suspect that lots of folks dont even notice that CC is off. It works, it passes the tests, so it is good.

turtlebits3mo ago

Humans can't handle large tasks either, which is why you break them into manageable chunks.

Just ask claude to write a plan and review/edit it yourself. Add success criteria/tests for better results.

BonoboIO3mo ago

You definitely have to create some sort of PLAN.md and PROGRESS.md via a command and an implement command that delegates work. That is the only way that I can get bigger things done no matter how „good“ their task feature is.

You run out of context so quickly and if you don’t have some kind of persistent guidance things go south

ottah3mo ago

It's not sufficient, especially if I am not learning about the problem by being part of the implementation process. The models are still very weak reasoners, writing code faster doesn't accelerate my understanding of the code the model wrote. Even with clear specs I am constantly fighting with it duplicating methods, writing ineffective tests, or implementing unnecessarily complex solutions. AI just isn't a better engineer than me, and that makes it a weak development partner.

vonneumannstan3mo ago

>AI just isn't a better engineer than me, and that makes it a weak development partner.

This would also be true of Junior Engineers. Do you find them impossible to work with as well?

koakuma-chan3mo ago

I tried doing that and it didn't work. It still adds "fallbacks" that just hide errors or the fact that there is no actual implementation and "In a real app, we would do X, just return null for now"

nprz3mo ago

There is research[0] currently being done on how to divide tasks and combine the answers to LLMs. This approach allows LLMs reach outcomes (solving a problem that requires 1 million steps) which would be impossible otherwise.

[0]https://arxiv.org/abs/2511.09030

woah3mo ago

All they did was prompt an LLM over and over again to execute one iteration of a towers of hanoi algorithm. Literally just using it as a glorified scripting language:

```

Rules:

- Only one disk can be moved at a time.

- Only the top disk from any stack can be moved.

- A larger disk may not be placed on top of a smaller disk.

For all moves, follow the standard Tower of Hanoi procedure: If the previous move did not move disk 1, move disk 1 clockwise one peg (0 -> 1 -> 2 -> 0).

If the previous move did move disk 1, make the only legal move that does not involve moving disk1.

Use these clear steps to find the next move given the previous move and current state.

Previous move: {previous_move} Current State: {current_state} Based on the previous move and current state, find the single next move that follows the procedure and the resulting next state.

```

This is buried down in the appendix while the main paper is full of agentic swarms this and millions of agents that and plenty of fancy math symbols and graphs. Maybe there is more to it, but the fact that they decided to publish with such a trivial task which could be much more easily accomplished by having an llm write a simple python script is concerning.

Spoom3mo ago

Good lord, I can only imagine the wasted electricity.

ottah3mo ago

No offense to the academic profession, but they're not a good source of advice for best practices in commercial software development. They don't have the experience or the knowledge sufficient to understand my workplace and tasks. Their skill set and job is orthogonal to the corporate world.

nprz3mo ago

Yes, the problem solved in the paper (Tower of Hanoi) is far more easily defined than 99% of actual problems you would find in commercial software development. Still proof of "theoretically possible" and seems like an interesting area of research.

findjashua3mo ago

you need a reviewer agent for every step of the process - review the plan generated by the planner, the update made by the task worker subagent, and a final reviewer once all tasks are done.

this does eat up tokens _very_ quickly though :(

Sol-3mo ago

With stuff like this, might be that all the infra build-out is insufficient. Inference demand will go up like crazy.

RGamma3mo ago

Unlocking the next order of magnitude of software inefficiency!

Though I do hope the generated code will end up being better than what we have right now. It mustn't get much worse. Can't afford all that RAM.

Sol-3mo ago

Dunno, it's probably less energy efficient than a human brain, but being able to turn electricity into intelligence is pretty amazing. RAM and power generation are engineering problems to be solved for civilization to benefit from this.

kylehotchkiss3mo ago

It'd be nice if CC could figure out all the required permissions upfront and then let you queue the job to run overnight

LtWorf2mo ago

Except it cannot really do anything unattended

intellegix2mo ago

It actually can with the right wrapper. I built an open source loop driver that runs Claude Code CLI autonomously with --dangerously-skip-permissions. It handles session continuity (--resume), budget enforcement, stagnation detection (two-strike system if turns stay low), and auto model fallback (Opus -> Sonnet on consecutive timeouts).

The key is streaming NDJSON output to track cost per iteration and detect completion markers. The human stays in control by editing CLAUDE.md between runs to steer the project.

https://github.com/intellegix/intellegix-code-agent-toolkit

Der_Einzige3mo ago

Anyone paying attention has known that demand for all type of compute than can run LLMs (i.e. GPUs, TPUs, hell even CPUs) was about to blow up, and will remain extremely large for years to come.

It's just HN that's full of "I hate AI" or wrong contrarian types who refuse to acknowledge this. They will fail to reap what they didn't sow and will starve in this brave new world.

sciencejerk3mo ago

Agreed, agent scaling and orchestration indicates that demand for compute is going to blow up, if it hasn't already. The rationale for building all those datacenters they can't build fast enough is finally making sense.

mrkeen3mo ago

Oh yeah I mean if you're a webdev and you haven't built several data centres already you're basically asking to be homeless.

emp173443mo ago

This reads like a weird cult-ish revenge fantasy.

RGamma3mo ago

And what about you? Show your "I used AI today" badge, right now!

nkmnz3mo ago

I’m looking for something like this, with opus in the driver seat, but the subagents should be using different LLMs, such as Gemini or Codex. Anyone know if such a tool? just-every/code almost does this, but the lead/orchestrator is always codex, which feels too slow compared to opus or Gemini.

nikcub3mo ago

I use opus for coding and codex for reviews. I trigger the reviews in each work task with a review skill that calls out to codex[0]

I don't need anything more complicated than that and it works fine - also run greptile[1] on PR's

[0] https://github.com/nc9/skills/tree/main/review

[1] https://www.greptile.com/

eaf7e2813mo ago

These two basically do what you want, let Claude be the manager and Codex/Gemini be the worker. Many say that Coder-Codex-Gemini is easier to understand than CCG-Workflow, which has too many commands to start with.

https://github.com/FredericMN/Coder-Codex-Gemini https://github.com/fengshao1227/ccg-workflow

This one also seems promising, but I haven't tried it yet.

https://github.com/bfly123/claude_code_bridge

All of them are made by Chinese dev. I know some people are hesitant when they see Chinese products, so I'll address that first. But I have tried all of them, and they have all been great.

khaliqgant3mo ago

You can accomplish this with https://github.com/AgentWorkforce/relay and make the Lead/Orchestrator any harness you want. At the core agent-relay is agent to agent communication but it unlocks quite a few multi agent orchestration paradigms. I wrote about some learnings here as well https://x.com/khaliqgant/status/2019124627860050109?s=46

fosterfriends3mo ago

I think this is where future cursor features will be great - to coordinate across many different model providers depending on the sub-jobs to be done

nkmnz3mo ago

What I want is something else: I want them to work in parallel on the same problem, and the orchestrator to then evaluate and consolidate their responses. I’m currently doing this manually, but it’s tedious.

knes3mo ago

At Augment' we've been working on this. Multi agents orchestration, spec driven, different models for different tasks, etc.

https://www.augmentcode.com/product/intent

can use the code AUGGIE to skip the queue. Bring your own agent (powered by codex, CC, etc) coming to it next week.

sathish3163mo ago

You can run an ensemble of LLMs (Opus, Gemini, Codex) in Claude Code Router via OpenRouter or any Agent CLI that supports Subagents and not tied to a single LLM like Opencode. I have an example of this in Pied-Piper, a subagent orchestrator that runs in Claude Code or ClaudeCodeRouter and uses distinct model/roles for each Subagent:

1. GPT-5.2 Codex Max for planning

2. Opus 4.5 for implementation

3. Gemini for reviews

It’s easy to swap models or change responsibilities. Doc and steps here: https://github.com/sathish316/pied-piper/blob/main/docs/play...

d4rkp4ttern3mo ago

This sounds very promising. Using multiple CC instances (or mix of CLI-agents) across tmux panes has always been a workflow of mine, where agents can use the tmux-cli [1] skill/tool to delegate/collaborate with others, or review/debug/validate each others work.

This new orchestration feature makes it much more useful since they share a common task list and the main agent coordinates across them.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

vardalab3mo ago

Yeah, I've been using your tools for a while. They've been nice.

giancarlostoro3mo ago

I was working on my own alternative to Beads... then I realized I could do exactly this with something similar to Beads, I'm planning on open sourcing it soon because I like what I have so far, I also made it so I can sync my tasks directly to my GitHub projects as well. I think its more useful to have agent tasks eventually synched back up to real ticketing systems for historical reasons. Besides, its better to have alternatives that are agent agnostic.

asdev3mo ago

I personally have no use for this type of workflow. I like parallel claude code instances in worktrees but nothing beyond that

hpdigidrifter3mo ago

Am not a fan of dealing with worktrees Maybe for larger longer lived tasks but the time spent on merges from different agents is definitely a big headwind for parallel work.

This seems handled by this new agent which is cool.

I gave up on worktrees and hacked together a solution with fine-grained lockfiles for editing, running builds, etc that worked surprisingly good for what it was

jFriedensreich3mo ago

While i appreciate anthropic making a proof of concept like they did with claude code cli on which they can then do RL to optimise the patterns that work, I expect this to be as unusable as the cli itself. Its a big difference if a model provider internalises something like thinking mode which mainly depends on context and text or if they try to grab a part of the agent loop which has to run on the side of the systems we build and use.

We cannot allow model providers to own the browsers, CLIs, memory, IDEs, extensions and other tooling. Its not just a matter of power but also they just suck at it as i experience every time i have to use claude code instead of amp.

I truly hope we get the pattern of innovation that looks like:

- some dude vibecodes a really cool idea

- model providers build into their reference implementations

- model providers optimize models to work optimally

- startup and/or open source projects step in and build something that is actually usable and opens a new market segment

We saw this play out beautifully with amp, kilo, roo, cline, continue

Another aspect is that we do not want interfaces just made for agents to work in teams, we want software made for humans and agents, that are true platforms for these agent teams to collaborate in.

drbscl3mo ago

I just built a quick plugin to automatically add agents & skills then fire off a team with them, depending on your task: https://github.com/drbscl/dream-team

traviscline3mo ago

Been using these types of flows across agent harnesses for a while. Check out https://github.com/tmc/it2

dangus3mo ago

A cynical read of this is that it’s all a ploy to maximize usage.

Why do agents need to speak to each other if they’re just doing the work correctly the first time?

Is it an admission that a single agent is not useful and reliable enough?

WXLCKNO3mo ago

I run a loop where I have 4 agents review in parallel after each implementation phase. It just increases the odds of finding issues.

I've switched this over to a team of 4 now that talk to each other to discuss issues they find and it's amazing. They confirm between themselves and if they wrongly identified something the others correct them.

dangus3mo ago

So, the answer is yes, a single agent makes too many mistakes and you have to run four of them (4x usage cost) to improve the quality.

I understand that it works better, but I am rightfully pointing out that it's less efficient.

An analogy would be putting a V8 engine into a pickup truck to make it go as fast as a Mazda Miata.

khaliqgant3mo ago

Been waiting for this to drop and excited to test it out. We've been building something in this space - https://github.com/AgentWorkforce/relay, a real-time messaging layer that lets AI coding agents talk to each other across any CLI.

Assign roles to different models and have them coordinate: Claude as the lead, Codex on backend, Gemini on frontend, etc.

I wrote about my experiences with multi-agent orchestration here: https://x.com/khaliqgant/status/2019124627860050109?s=46

imiric3mo ago

I find it amusing that the innovation in this space for the past year+ has been mostly centered around engineering: MCP, "agents", "skills", etc. Now "agent" orchestration is the new hotness.

Meanwhile, the same issues that have plagued these tools since their inception are largely ignored: hallucination, innacuracy, context collapse, etc. These won't be solved by engineering, but by new research and foundational improvements.

On one hand, solid engineering was sorely needed, and can extract a lot of value from the current tech. But on the other, all these announcements and improvements feel like companies grasping at straws to keep the hype cycle going by any means necessary. Charts must go up and to the right, or investors get antsy.

It's all adding to the mountain of signs that suggest that this isn't the path to artificial intelligence. It's interesting tech, with possibly many valuable applications, but the "AI" narrative is frankly tiring. I wish I could fast forward on this speculative phase, go past the inevitable crash, and arrive at a timeframe where we've figured out what this tech is actually good for, and where we hopefully use it more for good than evil.

rektlessness3mo ago

Are people using Claude max 20x plan for personal pet projects? Are these expensed? Have you liquidated all other hobbies to fund this? Asking for a friend.

ndesaulniers3mo ago

Subagents are out, put it all on agent teams!

greenfish63mo ago

something i really like from tryin git out over the last 10 minutes is that the main agent will continue talking to you while other agents are working, so you don't have to queue a message

taikahessu3mo ago

Clean up the team

Retr0id3mo ago

Claude Town

greenfish63mo ago

Excited to try this out. I've seen a lot of working systems on my own computer that share files to talk between different Claude Code agents and I think this could work similarly to that.

(i thought gas town was satire? people in comments here seem to be saying that gas town also had multi-agent file sharing for work tracking)

morleytj3mo ago

Gas Town decimated by Claude bomb from orbit

avereveard3mo ago

"finish Claude tokens quota in 3 minutes, largely over delegation and result messages instead of code writing"

j / k navigate · click thread line to collapse

224 comments

mcintyre19943mo ago

I’ve been mostly holding off on learning any of the tools that do this because it seemed so obvious that it’ll be built natively. Will definitely give this a go at some point!

pronik3mo ago

To the folks comparing this to GasTown: keep in mind that Steve Yegge explicitely pitched agent orchestrators to among others Anthropic months ago:

mohsen13mo ago

It's not like he was the only one who came up with this idea. I built something like that without knowing about GasTown or Beeds. It's just an obvious next step

https://github.com/mohsen1/claude-code-orchestrator

gbnwl3mo ago

MattPalmer10863mo ago

It was a fun article!

behnamoh3mo ago

Exactly! I built something similar. These are such low hanging fruit ideas that no one company/person should be credited for coming up with them.

yks3mo ago

Seriously, I thought that was what langchain was for back in 2023.

1 more reply

isoprophlex3mo ago

The main claude instance is instructed to launch as many ralph loops as it wants, in screen sessions. It is told to sleep for a certain amount of time to periodically keep track of their progress.

CuriouslyC3mo ago

pronik3mo ago

> Right now I can't write spec (or meta-spec) files quick enough to saturate the agent loops, and I can't QA their output well enough... mostly a me thing, i guess?

bonesss3mo ago

ttoinou3mo ago

If it was so obvious and easy, why didn't we have this a year ago ? Models were mature enough back then to make this work

bcrosby953mo ago

As usual, the hard part is the actual doing and producing a usable product.

CuriouslyC3mo ago

1 more reply

lossolo3mo ago

troupo3mo ago

Because they are not good engineers [1]

[1] Yes, they claim their AI-written slop in Claude Code is "a tiny game engine" that takes 16ms to output a couple of hundred of characters on screen: https://x.com/trq212/status/2014051501786931427

ruined3mo ago

what mature actor frameworks do you recommend?

jghn3mo ago

They did mention Akka in their post, so I would assume that's one of them.

troupo3mo ago

Elixir/Erlang. It's table stakes for them.

tyre3mo ago

Sorry, are you saying that engineers at Anthropic who work on coding models every day hadn’t thought of multiple of them working together until someone else suggested it?

I remember having conversations about this when the first ChatGPT launched and I don’t work at an AI company.

astrange3mo ago

Claude Code has already had subagent support. Mostly because you have to do very aggressive context window management with Claude or it gets distracted.

yieldcrv3mo ago

Why is Yegge so.... loud?

"wow he was yelling about agent orchestration in March 2025", I was about 5 months behind him, the company I was working for had its now seemingly obligatory "oh fuck, hackathon" back in August 2025

it stands to reason that any company working on foundational models and an agentic coding framework would also have talent thinking about that sooner than the rest of us

doesn't make sense, and gastown's branding is further unhinged goofiness

yeah I can't really play the attribution games on this one, can't really get behind who cares. I'm glad its available in a more benign format now

segmondy3mo ago

This is nothing new, folks have been doing this for since 2023. Lots of paper on arxiv and lots of code in github with implementation of multiagents.

... the "limit" were agents were not as smart then, context window was much smaller and RLVR wasn't a thing so agents were trained for just function calling, but not agent calling/coordination.

we have been doing it since then, the difference really is that the models have gotten really smart and good to handle it.

aaaalone3mo ago

Honestly this is one of plenty ideas I also have.

But this shows how much stuff is still to do in the ai space

GoatOfAplomb3mo ago

I wonder if my $20/mo subscription will last 10 minutes.

mohsen13mo ago

At this point, if you're paying out of pocket you should use Kimi or GLM for it to make sense

andai3mo ago

GLM is OK (haven't used it heavily but seems alright so far), a bit slow with ZAI's coding plan, amazingly fast on Cerebras but their coding plan is sold out.

Haven't tried Kimi, hear good things.

bluerooibos3mo ago

These are super slow to run locally, though, unless you've got some great hardware - right?

At least, my M1 Pro seems to struggle and take forever using them via Ollama.

corysama3mo ago

Try this https://unsloth.ai/docs/models/qwen3-coder-next

tclancy3mo ago

Ah ok, same. I keep wondering about how this would ever accomplish anything.

simlevesque3mo ago

I've had good results with Haiku for certain tasks.

bluerooibos3mo ago

This is great and all but, who can actually afford to let these agents run on tasks all day long? Is anyone here actually using this or are these rollouts aimed at large companies?

Thank god the open source/local LLM world isn't far behind.

MarkMarine3mo ago

rahimnathwani3mo ago

Many many companies can afford to hire a junior engineer for $150k/year (plus employer payroll taxes, employee benefits etc.).

Are you spending more than $150k per year on AI?

(Also, you're talking about the cost of your Cursor subscription, when the article is about Claude Code. Maybe try Claude Max instead?)

freeone30003mo ago

If it could do anything that a junior dev could, that’d be a valid point of comparison. But it continually, wildly performs slower and falls short every time I’ve tried.

rahimnathwani3mo ago

  But it continually, wildly performs slower and falls short every time I’ve tried.

If it falls short every time you've tried, it's likely that one or more of these is true:

A. You're working on some really deep thing that only world-class expects can do, like optimizing graphics engines for AAA games.

B. You're using a language that isn't in the top ~10 most popular in AI models' training sets.

C. You have an opportunity to improve your ability to use the tools effectively.

How many hours have you spent using Claude Code?

5 more replies

buzzerbetrayed3mo ago

I am way more productive with $200/month of AI than I would be with $5,000/month of junior developer. And it isn’t close.

1 more reply

andkenneth3mo ago

Companies are not comparing it straight to juniors. They're more making a comparison between a Senior with the assistance of one more more juniors, vs a Senior with the assistance of AI Agents.

I feel like comparison just to a junior developer is also becoming a fairly outdated comparison. Yes, it is worse in some ways, but also VASTLY superior in others.

1 more reply

logicx243mo ago

I can't even get through my Claude Max quota, and that's only 200/mo. And I code every day and use it for various other pretty-intensive tasks.

dangus3mo ago

only $200/mo…$200 a month is a used car payment.

I guarantee you that price will double by 2027. Then it’ll be a new car payment!

I pay less for Autocad products!

This whole product release is about maximizing your bill, not maximizing your productivity.

I don’t need agents to talk to each other. I need one agent to do the job right.

__turbobrew__3mo ago

1 more reply

kesslern3mo ago

Not saying $200/mo isn't a lot, but I think you're underestimating used car payments these days. The average US used car payment is above $500 now.

yomismoaqui3mo ago

As company owner the math is simple:

If I pay $3k/month to a developer and a $200/month tool makes them 10% more productive I will pay it without thinking.

nlh3mo ago

I pay $200/month, don’t come near the limits (yet), and if they raised the price to $1000/month for the exact same product I’d gladly pay it this afternoon (Don’t quote me on this Anthropic!)

4 more replies

bryanlarsen3mo ago

That's one of 3 possible futures.

1. 1-3 LLM vendors are substantially higher quality than other vendors and none of those are open source. This is an oligarchy and the scenario you described will play out.

3. Somewhere in between. We've got >3 vendors, but 1-3 of them are somewhat better than the others, so the leaders can charge more. But not as much more than they can in scenario #1.

1 more reply

Wowfunhappy3mo ago

Traditional SaaS products don't write code for me. They also cost much less to run.

1 more reply

buzzerbetrayed3mo ago

If you can’t get $200 of value out of Claude Code Max, then you need to really step up your game. That’s user error.

meowface3mo ago

1 more reply

emp173443mo ago

Especially for what’s basically an experiment. Gas town didn’t really work, so there’s no guarantee this will even produce anything of value.

reactordev3mo ago

You know those VC funded startups with just two founders… them.

jwpapi3mo ago

I mean what you get for Claude Code Max is insane its 30x on the token price. If you don’t spend that all it’s your own fault. That must be below elecricity cost

bhasi3mo ago

Seems similar to Gas Town

rafram3mo ago

reissbaker3mo ago

Where are the polecats, though? What about the mayor's dog?

koakuma-chan3mo ago

nprz3mo ago

Gas Town --> https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

nickorlow3mo ago

Wonder how they compare?

greenfish63mo ago

Ethee3mo ago

1 more reply

nickorlow3mo ago

yegge's article does come off as complicated design for the sake of complication

temuze3mo ago

Yeah but worse

No polecats smh

ramesh313mo ago

>"Seems similar to Gas Town"

bredren3mo ago

The action is hot, no doubt. This reminds me of Spacewar! -> Galaxy Game / Computer Space.

ottah3mo ago

nickstinemates3mo ago

Then, in your prompt you tell it the task you want, then you say, supervise the implementation with a sub agent that follows the architecture skill. Evaluate any proposed changes.

That's what this is supposed to be, encoded as a feature instead of a best practice.

satellite23mo ago

Aren't you just moving the problem a little bit further? If you can't trust it will implement carefully specified features, why would you believe it would properly review those?

frde_me3mo ago

It's hard to explain, but I've found LLMs to be significantly better in the "review" stage than the implementation stage.

So the LLM will do something and not catch at all that it did it badly. But the same LLM asked to review against the same starting requirement will catch the problem almost always

The missing thing in these tools is that automatic feedback loop between the two LLMs: one in review mode, one in implementation mode.

1 more reply

tclancy3mo ago

How does this not use up tokens incredibly fast though? I have a Pro subscription and bang up against the limits pretty regularly.

doctoboggan3mo ago

It _does_ use up tokens incredibly fast, which is probably why Anthropic is developing this feature. This is mostly for corporations using the API, not individuals on a plan.

1 more reply

indemnity3mo ago

I had to go to Max, Pro is more like a taster.

nickstinemates3mo ago

I don't know, all I can say is with API-based billing, doing multi-thousand like refactors that would take days to do costs like $4. In terms of value : effort, it's incredible.

andyferris3mo ago

It does use tokens faster, yes.

aqme283mo ago

This sounds more like an automation of that idea than just N-times the work.

Keyframe3mo ago

Glad I'm not the only one. I do the same, but I tend to have gemini be the one that critiques.

diego8983mo ago

Do you do this manually? Or some abstraction above that? skills, some light orchestration, etc?

aqme283mo ago

I just tell it to do so, but you could even add that as a requirement to CLAUDE.md

stpedgwdgfhgdd3mo ago

turtlebits3mo ago

Humans can't handle large tasks either, which is why you break them into manageable chunks.

Just ask claude to write a plan and review/edit it yourself. Add success criteria/tests for better results.

BonoboIO3mo ago

You run out of context so quickly and if you don’t have some kind of persistent guidance things go south

ottah3mo ago

vonneumannstan3mo ago

>AI just isn't a better engineer than me, and that makes it a weak development partner.

This would also be true of Junior Engineers. Do you find them impossible to work with as well?

koakuma-chan3mo ago

I tried doing that and it didn't work. It still adds "fallbacks" that just hide errors or the fact that there is no actual implementation and "In a real app, we would do X, just return null for now"

nprz3mo ago

[0]https://arxiv.org/abs/2511.09030

woah3mo ago

All they did was prompt an LLM over and over again to execute one iteration of a towers of hanoi algorithm. Literally just using it as a glorified scripting language:

```

Rules:

- Only one disk can be moved at a time.

- Only the top disk from any stack can be moved.

- A larger disk may not be placed on top of a smaller disk.

For all moves, follow the standard Tower of Hanoi procedure: If the previous move did not move disk 1, move disk 1 clockwise one peg (0 -> 1 -> 2 -> 0).

If the previous move did move disk 1, make the only legal move that does not involve moving disk1.

Use these clear steps to find the next move given the previous move and current state.

Previous move: {previous_move} Current State: {current_state} Based on the previous move and current state, find the single next move that follows the procedure and the resulting next state.

```

Spoom3mo ago

Good lord, I can only imagine the wasted electricity.

ottah3mo ago

nprz3mo ago

findjashua3mo ago

you need a reviewer agent for every step of the process - review the plan generated by the planner, the update made by the task worker subagent, and a final reviewer once all tasks are done.

this does eat up tokens _very_ quickly though :(

Sol-3mo ago

With stuff like this, might be that all the infra build-out is insufficient. Inference demand will go up like crazy.

RGamma3mo ago

Unlocking the next order of magnitude of software inefficiency!

Though I do hope the generated code will end up being better than what we have right now. It mustn't get much worse. Can't afford all that RAM.

Sol-3mo ago

kylehotchkiss3mo ago

It'd be nice if CC could figure out all the required permissions upfront and then let you queue the job to run overnight

LtWorf2mo ago

Except it cannot really do anything unattended

intellegix2mo ago

The key is streaming NDJSON output to track cost per iteration and detect completion markers. The human stays in control by editing CLAUDE.md between runs to steer the project.

https://github.com/intellegix/intellegix-code-agent-toolkit

Der_Einzige3mo ago

Anyone paying attention has known that demand for all type of compute than can run LLMs (i.e. GPUs, TPUs, hell even CPUs) was about to blow up, and will remain extremely large for years to come.

It's just HN that's full of "I hate AI" or wrong contrarian types who refuse to acknowledge this. They will fail to reap what they didn't sow and will starve in this brave new world.

sciencejerk3mo ago

mrkeen3mo ago

Oh yeah I mean if you're a webdev and you haven't built several data centres already you're basically asking to be homeless.

emp173443mo ago

This reads like a weird cult-ish revenge fantasy.

RGamma3mo ago

And what about you? Show your "I used AI today" badge, right now!

nkmnz3mo ago

nikcub3mo ago

I use opus for coding and codex for reviews. I trigger the reviews in each work task with a review skill that calls out to codex[0]

I don't need anything more complicated than that and it works fine - also run greptile[1] on PR's

[0] https://github.com/nc9/skills/tree/main/review

[1] https://www.greptile.com/

eaf7e2813mo ago

https://github.com/FredericMN/Coder-Codex-Gemini https://github.com/fengshao1227/ccg-workflow

This one also seems promising, but I haven't tried it yet.

https://github.com/bfly123/claude_code_bridge

All of them are made by Chinese dev. I know some people are hesitant when they see Chinese products, so I'll address that first. But I have tried all of them, and they have all been great.

khaliqgant3mo ago

fosterfriends3mo ago

I think this is where future cursor features will be great - to coordinate across many different model providers depending on the sub-jobs to be done

nkmnz3mo ago

knes3mo ago

At Augment' we've been working on this. Multi agents orchestration, spec driven, different models for different tasks, etc.

https://www.augmentcode.com/product/intent

can use the code AUGGIE to skip the queue. Bring your own agent (powered by codex, CC, etc) coming to it next week.

sathish3163mo ago

1. GPT-5.2 Codex Max for planning

2. Opus 4.5 for implementation

3. Gemini for reviews

It’s easy to swap models or change responsibilities. Doc and steps here: https://github.com/sathish316/pied-piper/blob/main/docs/play...

d4rkp4ttern3mo ago

This new orchestration feature makes it much more useful since they share a common task list and the main agent coordinates across them.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

vardalab3mo ago

Yeah, I've been using your tools for a while. They've been nice.

giancarlostoro3mo ago

asdev3mo ago

I personally have no use for this type of workflow. I like parallel claude code instances in worktrees but nothing beyond that

hpdigidrifter3mo ago

Am not a fan of dealing with worktrees Maybe for larger longer lived tasks but the time spent on merges from different agents is definitely a big headwind for parallel work.

This seems handled by this new agent which is cool.

I gave up on worktrees and hacked together a solution with fine-grained lockfiles for editing, running builds, etc that worked surprisingly good for what it was

jFriedensreich3mo ago

I truly hope we get the pattern of innovation that looks like:

- some dude vibecodes a really cool idea

- model providers build into their reference implementations

- model providers optimize models to work optimally

- startup and/or open source projects step in and build something that is actually usable and opens a new market segment

We saw this play out beautifully with amp, kilo, roo, cline, continue

Another aspect is that we do not want interfaces just made for agents to work in teams, we want software made for humans and agents, that are true platforms for these agent teams to collaborate in.

drbscl3mo ago

I just built a quick plugin to automatically add agents & skills then fire off a team with them, depending on your task: https://github.com/drbscl/dream-team

traviscline3mo ago

Been using these types of flows across agent harnesses for a while. Check out https://github.com/tmc/it2

dangus3mo ago

A cynical read of this is that it’s all a ploy to maximize usage.

Why do agents need to speak to each other if they’re just doing the work correctly the first time?

Is it an admission that a single agent is not useful and reliable enough?

WXLCKNO3mo ago

I run a loop where I have 4 agents review in parallel after each implementation phase. It just increases the odds of finding issues.

dangus3mo ago

So, the answer is yes, a single agent makes too many mistakes and you have to run four of them (4x usage cost) to improve the quality.

I understand that it works better, but I am rightfully pointing out that it's less efficient.

An analogy would be putting a V8 engine into a pickup truck to make it go as fast as a Mazda Miata.

khaliqgant3mo ago

Assign roles to different models and have them coordinate: Claude as the lead, Codex on backend, Gemini on frontend, etc.

I wrote about my experiences with multi-agent orchestration here: https://x.com/khaliqgant/status/2019124627860050109?s=46

imiric3mo ago

I find it amusing that the innovation in this space for the past year+ has been mostly centered around engineering: MCP, "agents", "skills", etc. Now "agent" orchestration is the new hotness.

rektlessness3mo ago

Are people using Claude max 20x plan for personal pet projects? Are these expensed? Have you liquidated all other hobbies to fund this? Asking for a friend.

ndesaulniers3mo ago

Subagents are out, put it all on agent teams!

greenfish63mo ago

something i really like from tryin git out over the last 10 minutes is that the main agent will continue talking to you while other agents are working, so you don't have to queue a message

taikahessu3mo ago

Clean up the team

Retr0id3mo ago

Claude Town

greenfish63mo ago

Excited to try this out. I've seen a lot of working systems on my own computer that share files to talk between different Claude Code agents and I think this could work similarly to that.

(i thought gas town was satire? people in comments here seem to be saying that gas town also had multi-agent file sharing for work tracking)

morleytj3mo ago

Gas Town decimated by Claude bomb from orbit

avereveard3mo ago

"finish Claude tokens quota in 3 minutes, largely over delegation and result messages instead of code writing"

j / k navigate · click thread line to collapse