Description: It is a survey that tries to identify our current progress in designing task-oriented LLM systems - how informed we are when we make decisions about prompting, augmentation, etc. It lists several such design parameters, describes them, and explores varying these parameters through a thought experiment (!). Then we select three parameters (prompting, augmentation, and uncertainty estimation), try to define them, and organize select available research on these topics. Our definition and organization differ slightly from what you'd expect as we try to avoid overlap in each parameter.
Later, we discuss what we find, defining "linear and non-linear contexts", and using it to show how all (?) prompting techniques can be viewed as multi-agent systems, and speak about the implications of that - one of which is on synthetic data generation which the HN community might be interested in. In all, the paper shares seven conjectures to help guide future research efforts.
I will list these conjectures in a comment for those short on time.
Thank you for reading!
1. Autonomous, multi-agent collaboration allows less capable tool-augmented language models to surpass more capable tool-augmented language models, as the number of collaborating agents increases; given these less capable tool-augmented LMs have a threshold level of capabilities.
2. Multiple LLM-based agents working together should be more capable than current research suggests, and their relative lack of success warrants investigation.
3. Even if we never discover an architecture better than current LLMs, or better training algorithms, or it turns out that scaling up LLMs and their training data does not lead to any new emergent abilities; we can still be able to achieve useful autonomous AI agents through - (a) larger context sizes and better context utilization. (b) ensuring extensive collaboration between LLM agents and extensive tool-use is "in-distribution", i.e., well represented in its training data. (c) sampling/decoding strategies that work well for large context lengths.
4. Results from prompting techniques involving non-sequential context, can predict similar results from multi-agent systems designed to replicate the same behavior.
5. Any result that utilizes multi-agent systems can predict similar results using prompting techniques (such as self-collaboration) designed to replicate the multi-agent interaction pattern within a sequential context.
6. Synthetically generated "self-collaboration" traces or transcripts from successful attempts at solving tasks using prompting techniques involving non-sequential context or multi-agent collaboration, is high-quality training data for LLMs, especially for downstream use in multi-LLM agent systems and with prompting techniques involving non-linear context.
7. Taking existing problems and associated real-world deliverables (intermediate and final) and interpolating interaction artifacts between collaborators as a transcript or trace can create high-quality synthetic training data, specifically for downstream use in multi-LLM agent systems.
We try to define terms such as "extensive tool-use and collaboration", "linear and non-linear context" in the paper. The current version is our first draft, and we're hoping to gather feedback for a final version that is hopefully comprehensive, includes taxonomy charts, discusses more design parameters and gives special attention to the task-decomposition and planning.