undefined | Better HN

0 pointsJgrubb11d ago0 comments

The tokens are still being burnt, they're just doing so in a parallel dimension from the users main context window.

0 comments

4 comments · 2 top-level

ajmurmann11d ago· 2 in thread

It's true that the initial tool response still has the same amount of tokens but it doesn't keep dragged along in the longer-lived top context.

knollimar11d ago

Don't you resend after every turn, so splitting it avoids the n^2 token usage (granted it's cached so there's some optimal amount here)

ajmurmann10d ago

Yes, exactly. You resend it on every turn (assuming no cache hits). This is why using the shorter-lived subagent to take in that context and only return the useful result back to the longer-lived context safes tokens.

ViewTrick100211d ago

The real benefit is being able to use a cheaper, but good enough, model with a specific system prompt dedicated to that task.

j / k navigate · click thread line to collapse