I'd like to quickly summarize what is different about our approach and why it matters.
Our work was inspired by brilliant research done at MIT CSAIL on "Recursive Language Models" (RLMs). One of the controversies has been whether these models are just a formalization of what agents like Claude Code already do vs. whether they bring new capabilities to the table.
By outperforming Claude on the major long-context benchmark, we provide a strong signal that something fundamentally new is happening. (In other words, it's not "just Claude Code" because it demonstrably outperforms Claude Code in the long-context regime.)
Where our contribution, LCM, differs from RLMs is how we handle recursion. RLMs use "symbolic recursion" -- i.e., they have an LLM write a script to recursively call itself in order to manipulate the context, which is stored in a REPL. This provides maximum flexibility... but it often goes wrong, since the LLM may write imperfect scripts.
LCM attempts to decompose the recursion from RLMs into deterministic primitives so that the control flow can be managed by an engine rather than left to the whims of the LLM. In practice, this means we replace bespoke scripts with two mechanisms: (1) A DAG-based context management system that works like paged virtual memory, except for managing conversations and files; and (2) Operator-level recursion, like "Map" for LLMs, which lets one tool call process thousands of tasks.
An analogy we draw in the paper is the evolution from GO-TO statements (of Dijkstra's "Considered Harmful" fame) to structured programming. RLMs are maximally expressive, but all of that power comes with the risk of things going awry. We have built a more mechanistic system, which can provide stronger guarantees when deployed in production with today's models.
Happy to answer any questions! Thanks for taking a look at the paper!
I've echoed the sentiment here on HN (and elsewhere) that these kinds of mechanisms seem to be a pathway to extending context longer and longer and longer and I wish I could toy around with this technology right now (can I?). I'm so excited!!
Your work is the shoulders-built-on-shoulders upon which other giants shall keep on building. Thank you so much.
Yes, we think there is a ton of low-hanging fruit from taking lessons from OS/PL theory and applying them to LLM tooling.
This is our first contribution in that direction. There will be more!
Have you considered making an openclaw plugin/PR for it? I understand you have your own coding CLI tool, but I don’t think this looks so hard to implement that it can’t be implemented elsewhere.
Either way, thanks for sharing this.
We have heard from a ton of OpenClaw users that the biggest barrier to them getting everything they want out of their agents is that memory is not a solved problem.
LCM could be a great solution to that. Stay tuned -- will ship it ASAP.
> deterministic primitives
Are agent-map and LLM-map the only two options you've given the model for recursive invocations? No higher-level, er, reduction operators to augment the map primitives?
It's quite possible that's wrong. If so, I would write llm_reduce like this: it would spawn a sub-task for every pair of elements in the list, which would call an LLM with a prompt telling it how to combine the two elements into one. The output type of the reduce operation would need to be the same as the input type, just like in normal map/reduce. This allows for a tree of operations to be performed, where the reduction is run log(n) times, resulting in a single value.
That value should probably be loaded into the LCM database by default, rather than putting it directly into the model's context, to protect the invariant that the model should be able to string together arbitrarily long sequences of maps and reduces without filling up its own context.
I don't think this would be hard to write. It would reuse the same database and parallelism machinery that llm_map and agentic_map use.
On an unrelated note - I did notice you were also a lawyer - So umm how what is next for you re this? What should we gear up for :)
We don't have any other materials yet, but let's see if this lands for you. I can run you through a couple simpler versions of the system, why they don't work, and how that informs our ultimate design.
The most basic part of the system is "two layers". Layer 1 is the "ground truth" of the conversation - the whole text the user sees. Layer 2 is what the model sees, i.e., the active context window.
In a perfect world, those would be the same thing. But, as you know, context lengths aren't long enough for that, so we can't fit everything from Layer 1 into Layer 2.
So instead we keep a "pointer" to the appropriate part of Layer 1 in Layer 2. That pointer takes the form of a summary. But it's not a summary designed to contain all information. It's more like a "label" that makes sure the model knows where to look.
The naive version of the system would allow the main model to expand Layer 2 summaries by importing all of the underlying data from Layer 1. But this doesn't work well, because then you just end up re-filling the Layer 2 context window.
So instead you let the main model clone itself, the clone expands the summary in its context (and can do this for multiple summaries, transforming each into the original uncompressed text), and then the clone returns whatever information the main thread requires.
Where this system would not fully match the capabilities of RLMs is that, by writing a script that calls itself e.g. thousands of times, an RLM has the ability to make many more recursive tool calls than can fit in a context window. So we fix that using operator-level recursion, i.e., we give the LLM a tool, map, that executes arbitrary recursion, without the LLM having to write a custom script to accomplish that.
Hope this helps!
- Clint
Much of this feels like a technical report of what they did, and makes me feel like we've reached the ICO whitepaper phase. I have very similar features in my custom coding agent, they seem pretty common sense to have. Are you really throwing away the compacted history? Saving it doesn't seem like a feature, the opposite seems like a gap. Same for making it available via toos/search, pretty standard stuff. Then too, ADK framework I use handles parallel agents/tools.
> Because expansion can recover arbitrarily large volumes of earlier conversation, this tool is restricted to sub-agents spawned via the Task tool; the main agent cannot call it directly. This restriction prevents uncontrolled context growth in the primary interaction loop.
What if the lcm_expand is called for a summary that has 1000s of messages that immediately floods the sub-agent's own context window?
Does lcm_expand only unroll one "layer" of the DAG and unrolls more if needed by another subagent?
The reason that the volume is potentially arbitrarily large is that one sub-agent can call lcm_expand multiple times - either vertically or horizontally. But that's a process that occurs gradually as the tool is used repeatedly.
This has not been a problem in our testing, but if it were a problem it would be easy to prevent sub-agents from invoking lcm_expand once their context buffer has reached a specified threshold.
Summary A = summarise(message 1 to P)
Summary B = summarise(Summary A, message P+1 to Q)
Summary C = summarise(Summary B, message Q+1 to R)
What does calling lcm_expand(Summary C) do? Does it unroll all messages from message 1 to message R or does it unroll to Summary B and message Q+1 to R?> volume is potentially arbitrarily large is that one sub-agent can call lcm_expand multiple times - either vertically or horizontally
I'm assuming from this that it's the latter? In that case, that addresses my concern about not blowing up the context window immediately.
1. In our use of coding agents, we find that there are often things referenced earlier in the conversation (API keys, endpoint addresses, feedback to the agent, etc.) that it's useful to have persist.
2. This is a general-purpose LLM memory system, which we've just used here to build a coding agent. But it is also designed for personal assistants, legal LLMs, etc.
That terminology can be confusing, because in other cases (and sometimes in our own architecture, like when executing thousands of operations via MAP) a sub-agent may be a smaller model given less complex individual tasks.
But the core mechanism we use for simulating unlimited context is to allow the main model to spin up instances of itself (sub-agents) with the previously summarized portion of the context expanded into its full, uncompressed state.
Expanding summaries into full text in sub-agents rather than the main thread is a critical part of our architecture, because it prevents the main context window from filling up.
It has two differences:
1. It does not store chat history, reasoning traces etc, only workflow artifacts (requirements, codebase analysis, implementation plan, etc). I frankly do not believe those things are relevant.
2. It is significantly simpler and more lightweight, using only markdown files