Agents Are Not Enough (opens in new tab)

(arxiv.org)

207 pointsawaxman111y ago156 comments

156 comments

89 comments · 22 top-level

tonetegeatinst1y ago· 16 in thread

Somewhat related but here's my take on super intelligence or AGI. I have worked with CNN,GNN and other old school AI methods, but don't have the resources to build a real SOT LLM, but I do use and tinker with LLM's occasionally.

If AGI or SI(super intelligence)/is possible, and that is an if...I don't think LLM's are going to be this silver bullet solution Just as we have in the real world of people who are dedicated to a single task in their field like a lawyer or construction workers or doctors and brain surgeons, I see the current best path forward as being a "mixture of experts". We know LLM's are pretty good for what iv seen some refer to as NLP problems, where the model input is the tokenized string input. However I would argue an LLM will never built a trained model like stockfish or deepseek. Certain model types seem to be suited to certain issues/types of problems or inputs. True AGI or SI would stop trying to be a grand master of everything but rather know what best method/model should be applied to a given problem. We still do not know if it is possible to combine the knowledge of different types of neural networks like LLMs, convolutional neural networks, and deep learning...and while its certainly worth exploring, it is foolish to throw all hope on a single solution approach. I think the first step would be to create a new type of model where given a problem of any type. It knows the best method to solve it. And it doesn't rely on itself but rather the mixture of agents or experts. And they don't even have to be LLMs. They could be anything.

Where this really would explode is, if the AI was able to identify a problem that it can't solve and invent or come up with a new approach, multiple approaches, because we don't have to be the ones who develop every expert.

wkat42421y ago

Totally agree. An LLM won't be an AGI.

It could be part of an AGI, specifically the human interface part. That's what an LLM is good at. The rest (knowledge oracle, reasoning etc) are just things that kinda work as a side-effect. Other types of AI models are going to be better at that.

It's just that since the masses found that they can talk to an AI like a human they think that it's got human capabilities too. But it's more like fake it till you make it :) An LLM is a professional bullshitter.

Terr_1y ago

> It's just that since the masses found that they can talk to an AI like a human

In a way it's worse: Even the "talking to" part is an illusion, and unfortunately a lot of technical people have trouble remembering it too.

In truth, the LLM is an idiot-savant which dreams up "fitting" additions to a given document. Some humans have prepared a document which is in the form of a a theater-play or a turn-based chat transcript, with a pre-written character that is often described as a helpful robot. Then the humans launch some code that "acts out" any text that looks like it came from that fictional character, and inserts whatever the real-human-user types as dialogue for the document's human-character.

There's zero reason to believe that the LLM is "recognizing itself" in the story, or that is is choosing to self-insert itself into one of the characters. It's not having a conversation. It's not interacting with the world. It's just coded to Make Document Bigger Somehow.

> they think that it's got human capabilities too

Yeah, we easily confuse the character with the author. If I write an obviously-dumb algorithm which slaps together a story, it's still a dumb algorithm no matter how smart the robot in the story is.

2 more replies

lugu1y ago

I am not sure what you mean by LLM when you say they are professional bullshitter. While it was certainly true for model based on transformers just doing inference, recent models have progressed significantly.

1 more reply

daxfohl1y ago

There's a _lot_ of smoke and mirrors. Paste a sudoku into chatgpt and ask it to solve. Amazing, it does it perfectly! Of course that's because it ran a sudoku-solving program that it pulled off github.

Now ask it to solve step by step by pure reasoning. You'll get a really intelligent sounding response that sounds correct, but on closer inspection makes absolutely no sense, every step has ridiculous errors like "we start with options {1, 7} but eliminate 2, leaving only option 3", and then at the end it just throws all that out and says "and therefore ..." and gives you the original answer.

That tells me there's essentially zero reasoning ability in these things, and anything that looks like reasoning has been largely hand-baked into it. All they do on their own is complete sentences with statistically-likely words. So yeah, as much as people talk about it, I don't see us as being remotely close to AGI at this point. Just don't tell the investors.

conception1y ago

On the other side of the coin, I think people also underestimate the amount of human thinking and intelligence is just completing statistically likely words. Most actions and certainly reactions people do everyday involve very little reasoning. Instead just following the most used neuron.

2 more replies

pton_xd1y ago

> However I would argue an LLM will never built a trained model like stockfish or deepseek.

It doesn't have to, the LLM just needs access to a computer. Then it can write the code for Stockfish and execute it. Or just download it, the same way you or I would.

> True AGI or SI would stop trying to be a grand master of everything but rather know what best method/model should be applied to a given problem.

Yep, but I don't see how that relates to LLMs not reaching AGI. They can already write basic Python scripts to answer questions, they just need (vastly) more advanced scripting capabilities.

lukeplato1y ago

I don't see why a mixture of experts couldn't be distilled into a single model and unified latent space

energy1231y ago

You could, but in many cases you wouldn't want to. You will get superior results with a fixed compute budget by relying on external tool use (where "tool" is defined liberally, and can include smaller narrow neural nets like GraphCast & AlphaGo) rather that stuffing all tools into a monolithic model.

1 more reply

zaroth1y ago

Exactly what DeepSeek3 is doing.

phaedrus1y ago

But the G in AGI stands for General. I think the hope is that there is some as-yet-undiscovered algorithm for general intelligence. While I agree that deferring to a subsystem that is an expert in that type of problem is the best way to handle problems, I would hope that it is possible that that central coordinator not just be able to delegate but design new subsystems as needed. Otherwise what happens when you run out of types of expert problem solvers to use (and still haven't solved the problem well)?

One might argue maybe a mixture of experts is just the best that can be done - and that it's unlikely the AGI be able to design new experts itself. However where do the limited existing expert problem solvers come from? Well - we invented them. Human intelligences. So to argue that an AGI could NOT come up with its own novel expert problem solvers implies there is something ineffable about human general intelligence that can't be replicated by machine intelligence (which I don't agree with).

vrighter1y ago

Once I was high and thought of hallucinations as "noise in the output". From that perspective, and the fact that LLMs are probabilistic machines, then halving the noise would probably involve 4x the computation needed. Which seems to track what I observe. Models are getting MUCH larger, but performance is practically at a standstill.

Upvoter331y ago

"If AGI ... is possible"

I don't get this line of thinking. AGI already exists - it's in our heads!

So then the question is: is what's in our heads magic, or can we build it? If you think it's magic, fine - no point arguing. But if not, we will build it one day.

jerojero1y ago

The brain is such an intractable web of connections that it has been really difficult to properly make sense of it.

We can't really talk too much about the differences between the intelligence of a dog and the intelligence of a human; in real terms. It seems as though humans might have more connections, different types of cells but then again; there's species out there that also have types of neurons we don't have and more dense regions in areas of the brain than we do.

And on top of that, dive into a single neuron and you will find a world of complexity. The reason why a neuron might fire or not given a stimuli is an extremely complicated and often stochastic process; that's actually one of the reasons why we use non-linearities in the neural networks we create. But how nuance are we really capturing?

The reason we do mathematics the way we do has well studied neurological patterns, we come out of the box with understandings of the world. And many animals do, actually, similar neurological patterns are found in different species.

It's incredible to think of the precision and the complexity of the tasks a fly undertakes during their life, and we actually have mapped the entire brain (if we can call it that, i would) of a fly. Every neuron and every connection the fly has. There's experiments done with neural networks where we've tried to imitate these (the brain of a fly has less parameters [number of nodes and edges] than modern LLMs) with very interesting results. But can we say we understand them? Not really.

And finally, I want to bring up something that's not usually considered when it comes to these things but there's a lot of processes at the molecular level in our cells that actually make use of quantum mechanics, there's a whole field of biology that's dedicated to studying these processes. So yeah, I mean, maybe we can build it but first we need to understand what's going on and why, I believe.

1 more reply

seadan831y ago

Expert beginner problem. If you can count a grain of sand, and measure the distance of one centimeter, then surely you can measure the exact length of a coastline and count the exact number grains of sand! (The length and number of grains goes to infinity as you get more detailed)

It is less magic, just insanely complicated. We therefore very well might not build it one day. Your claim we would solve it one day is not obvious and needs solid evidence. Some cryptographic problems require millions of years of compute to solve, why cant it be the case that AGI requires petayears of compute? A billion fold increase in compute still won't do it, hence, maybe not ever. 4 billion years and a trillion fold increase in compute might not be enough. (Assuming we have that long. Dawkins was most concerned about humanity surviving the next 500 years.)

trescenzi1y ago

GI is in our heads. The A is artificial which means built by humans. They are asking the same question you are.

1 more reply

nuancebydefault1y ago

Indeed! That's what I have been thinking for a while but I never had the occasion and or breath to write it down, and you explained it concisely. Finally some 'confirmation' 'bias'...

simonw1y ago· 14 in thread

This paper does at least lead with its version of what "agents" means (I get very frustrated when people talk about agents without clarifying which of the many potential definitions they are using):

> An agent, in the context of AI, is an autonomous entity or program that takes preferences, instructions, or other forms of inputs from a user to accomplish specific tasks on their behalf. Agents can range from simple systems, such as thermostats that adjust ambient temperature based on sensor readings, to complex systems, such as autonomous vehicles navigating through traffic.

This appears to be the broadest possible definition, encompassing thermostats all the way through to Waymos.

adpirz1y ago

You posted on X a while back asking for a crowdsourced definition of what an "agent" was and I regularly cite that thread as an example of the fact that this word is so blurry right now.

simonw1y ago

I really need to write that up in one place - closest I've got is this section from my 2024 review https://simonwillison.net/2024/Dec/31/llms-in-2024/#-agents-...

6 more replies

mindcrime1y ago

It's been blurry for a long time, FWIW. I have books on "Agents" dating back to the late 90's or early 2000's in which the "Intro" chapter usually has a section that tries to define what an "agent" is, and laments that there is no universally accepted definition.

To illustrate: here's a paper from 1996 that tries to lay out a taxonomy of the different kinds of agents and provide some definitions:

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

And another from the same time-frame, which makes a similar effort:

https://www.researchgate.net/profile/Stan-Franklin/publicati...

1 more reply

williamcotton1y ago

So basically just the concept of feedback in a cybernetic system.

https://en.wikipedia.org/wiki/Cybernetics

bob10291y ago

> The field is named after an example of circular causal feedback—that of steering a ship (the ancient Greek κυβερνήτης (kybernḗtēs)...

Now that name makes a lot more sense to me.

1 more reply

8338550bff961y ago

Then is "agents" just non-spooky coded language for "cyborgs"

1 more reply

cratermoon1y ago

We're at the phase of the hype cycle where "agent" means whatever the marketing materials want it to mean.

behnamoh1y ago

People have been talking about agents for at least 2 years. Remember when AgentGPT came out? How's that going so far? Agents are just LLMs with structured output, which often happens to be a JSON with info about a function arguments to be called.

mindcrime1y ago

> People have been talking about agents for at least 2 years.

WAY longer than that. What's come to the forefront specifically in the last year or two is very specific subset of the overall agent landscape. What I like to call "LLM Agents". But "Agents" at large date back to at least the 1980's if not before. For some of the history of all of this, see this page and some of the listed citations:

https://en.wikipedia.org/wiki/Software_agent

> Agents are just LLMs with structured output

That's only true for the "LLM Agent" version. There are Agents that have nothing to do with LLM's at all.

1 more reply

curious_cat_1631y ago

Yes, and the definition works reasonably well for the core arguments they are making in Section 5.

I suspect they'll follow up with a full paper with more details (and artifacts) of their proposed approach.

htrp1y ago

agents are the 2020s version of data science in the 2010s

Kerbonut1y ago

Do you mean that agents are being hyped in the same way data science was in the 2010s, or that they’ll have a similar impact over time? Would love to hear more of your thoughts.

1 more reply

baxtr1y ago

To me "Agents" sound like computer programs that interact through APIs?

bsenftner1y ago

Oh come on! You and I know very well an AI Agent is anything marketing says they are, and that is the absolute final truth.

bob10291y ago· 9 in thread

I think the goldilocks path is to make the user the agent and use the LLM simply as their UI/UX for working with the system. Human (domain expert) in the loop gives you a reasonable chance of recovering from hallucinations before they spiral entirely out of control.

"LLM as UI" seems to be something hanging pretty low on the tree of opportunity. Why spent months struggling with complex admin dashboard layouts and web frameworks when you could wire the underlying CRUD methods directly into LLM prompt callbacks? You could hypothetically make the LLM the exclusive interface for managing your next SaaS product. There are ways to make this just as robust and secure as an old school form punching application.

barrkel1y ago

It's quite tedious to have to write (or even say) full sentences to express intent. Imagine driving a car with a voice interface, including accelerator, brake, indicators and so on. Controls are less verbose and dashboards are more information rich than linear text.

It's difficult to be precise. Often it's easier to gauge things by looking at them while giving motor feedback (e.g. turning a dial, pushing a slider) than to say "a little more X" or "a bit less Y".

Language is poorly suited to expressing things in continuous domains, especially when you don't have relevant numbers that you can pick out of your head - size, weight, color etc. Quality-price ratio is a particularly tough one - a hard numeric quantity traded off against something subjective.

Most people can't specify up front what they want. They don't know what they want until they know what's possible, what other people have done, started to realize what getting what they want will entail, and then changed what they want. It's why we have iterative development instead of waterfall.

LLMs are a good start and a tool we can integrate into systems. They're a long, long way short of what we need.

GiorgioG1y ago

re: LLM as UI: Given that I don't trust LLMs to be deterministic, I wouldn't trust them to make the correct API call every time I tell it to do X.

kgeist1y ago

I think most users have a fixed set of workflows which usually don't change from day to day, so why not just use LLMs as a macro builder with a natural language interface (and which doesn't require you to know the product's UI well beforehand):

- you ask LLM to build a workflow for your problem

- the LLM builds the workflow (macro) using predefined commands

- you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

- you save the workflow and can use it without any LLM agents, just clicking a button - pretty determenistic and reliable

Advantages:

- reliable, deterministic

- you don't need to learn a product's UI, you just formulate your problem using natural language

5 more replies

hitchstory1y ago

I dont either, but this can be mitigated by adding guard rails (strictly validating input), double checking actions with the user and using it for tasks where a mistake isnt world ending.

Even then mistakes can slip through, but it could still be more reliable than a visual UI.

There are lots of horrible web UIs i would LOVE to replace with a conversational LLM agent. No #1 is jira and so is no #2 and #3.

deadbabe1y ago

They are deterministic at 0 temperature

5 more replies

pwillia71y ago

I had the same epiphany about LLM as UI trying to build a front end for a image enhancer workflow I built with Stable Diffusion. I just about fully built out a Chrome extension and then realized I should just build a 'tool' that llama can interact with and use open webui as the front end.

quick demo: https://youtu.be/2zvbvoRCmrE

diggan1y ago

> I think the goldilocks path is to make the user the agent and use the LLM simply as their UI/UX for working with the system

That's a funny definition to me, because doing so would mean the LLM is the agent, if you use the classic definition for "user-agent" (as in what browsers are). You're basically inverting that meaning :)

klabb31y ago

> "LLM as UI" seems to be something hanging pretty low on the tree of opportunity.

Yes if you want to annoy your users and deliberately put roadblocks to make progress on a task. Exhibit A: customer support. They put the LLM in between to waste your time. It’s not even a secret.

> Why spent months struggling with complex admin dashboard layouts

You can throw something together, and even auto generate forms based on an API spec. People don’t do this too often because the UX is insufficient even for many internal/domain expert support applications. But you could and it would be deterministic, unlike an LLM. If the API surface is simple, you can make it manually with html & css quickly.

Overuse of web frameworks has completely different causes than ”I need a functional thing” and thus it cannot be solved with a different layer of tech like LLMs, NFTs or big data.

wkat42421y ago

> Yes if you want to annoy your users and deliberately put roadblocks to make progress on a task. Exhibit A: customer support. They put the LLM in between to waste your time. It’s not even a secret.

No this is because they use the LLM not only as human interface but also as a reasoning engine for troubleshooting. And give it way less capability than a human agent to boot. So all it can really do is serve FAQs and route to real support.

In this case the fault is not with the LLM but with the people that put it there.

georgestrakhov1y ago· 6 in thread

IMHO, the word agent is quickly becoming meaningless. The amount of agency that sits with the program vs. the user is something that changes gradually.

So we should think about these things in terms of how much agency are we willing to give away in each case and for what gain[1].

Then the ecosystem question that the paper is trying to solve will actually solve itself, because it is already the case today that in many processes agency has been outsourced almost fully and in others - not at all. I posit that this will continue, just expect a big change of ratios and types of actions.

[1] https://essays.georgestrakhov.com/artificial-agency-ladder/

HarHarVeryFunny1y ago

An agent, or something that has agency, is just something that takes some action, which could be anything from a thermostat regulating the temperature all the way up to an autonomous entity such as an animal going about it's business.

Hugging Face have their own definitions of a few different types of agent/agentic system here:

https://huggingface.co/docs/smolagents/en/conceptual_guides/...

As related to LLMs, it seems most people are using "agent" to refer to systems that use LLMs to achieve some goal - maybe a fairly narrow business objective/function that can be accomplished by using one or more LLMs as a tool to accomplish various parts of the task.

khafra1y ago

> An agent, or something that has agency, is just something that takes some action, which could be anything from a thermostat regulating the temperature all the way up to an autonomous entity such as an animal going about it's business.

I have seen "agency" used in a much more specific way than this: An agent is something that has goals expressed as states of a world, and has an internal model of the world, and takes action to fulfill its goals.

Under this definition, a thermostat is not an agent. A robot vacuum cleaner that follows a list of simple heuristics is also not an agent, but a robot vacuum cleaner with a Simultaneous Location and Mapping algorithm which tries to clean the whole floor with some level of efficiency in its path is an agent.

I think this is a useful definition. It admits a continuum of agency, just like the huggingface link; but it also allows us to distinguish between a kid on a sled, and a rock rolling downhill.

https://www.alignmentforum.org/tag/agent-foundations has some justification and further elaboration.

1 more reply

sgt1011y ago

Hi - have a look at this book if you are interested [1] (Mike Wooldridge, Multi-Agent Systems)

[1] https://amzn.eu/d/6a1KgnL

Here are Mike's credentials :https://www.cs.ox.ac.uk/people/michael.wooldridge/

w10-11y ago

> IMHO, the word agent is quickly becoming meaningless. The amount of agency that sits with the program vs. the user is something that changes gradually

Yes, the term is becoming ambiguous, but that's because it's abstracting out the part of AI that is most important and activating: the ability to work both independently and per intention/need.

Per the paper: "Key characteristics of agents include autonomy, programmability, reactivity, and proactiveness.[...] high degree of autonomy, making decisions and taking actions independently of human intervention."

Yes, "the ecosystem will evolve," but to understand and anticipate the evolution, one needs a notion of fitness, which is based on agency.

> So we should think about these things in terms of how much agency are we willing to give away in each case

It's unclear there can be any "we" deciding. For resource-limited development, the ecosystem will evolve regardless of our preferences or ethics according to economic advantage and capture of value. (Manufacturing went to China against the wishes of most everyone involved.)

More generally, the value is AI is not just replacing work. It's giving more agency to one person, avoiding the cost and messiness of delegation and coordination. It's gaining the same advantages seen where smaller team can be much more effective than a larger one.

Right now people are conflating these autonomy/delegation features with the extension features of AI agents (permitting them to interact with databases or web browsers). The extension vendors will continue to claim agency because it's much more alluring, but the distinction will likely become clear in a year or so.

paulryanrogers1y ago

> Manufacturing went to China against the wishes of most everyone involved

Certainly those in China and the executive suites of Western countries wished it, and made it happen. Arguably the western markets wanted it too when they saw the prices dropping and offerings growing.

AI isn't happening in a vacuum. Shareholders and customers are buying it.

rcarmo1y ago

I think people keep conflating agency with agents, and that they are actually two entirely different things in real life. Right now agents have no agency - they do dot independently come up with new approaches, they’re mostly task-oriented.

TaurenHunter1y ago· 5 in thread

"More Agents is all you need" https://arxiv.org/abs/2402.05120

I could not find a "Agents considered harmful" related to AI, but there is this one: "AgentHarm: A benchmark for measuring harmfulness of LLM agents" https://arxiv.org/pdf/2410.09024

This "Agents considered harmful" is not AI-related: https://www.scribd.com/document/361564026/Math-works-09

ksplicer1y ago

When reading anthropics blog on agents I basically took away that their advice is you shouldn't use them to solve most problems.

https://www.anthropic.com/research/building-effective-agents

"For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough."

retinaros1y ago

True this was also my conclusion in October. Most of the complexity we are building is to fight against the limitations of LLMs. If in some way we could embed all our tools in a single call and have the LLM successfully figure out which tools to call then that would be it and we wouldn’t need any of those frameworks or libraries. But it turns out the reality of agents and tool use is pretty stark and you wouldn’t know that looking at the AI influencer spamming X, Linkedin, Youtube

However The state of agents slightly changed and while we had 25% accuracy in multiturn conversations we re now at 50.

kridsdale11y ago

Morpheus taught me they are quite harmful.

sgt1011y ago

Hi - have a look at this book if you are interested [1] (Mike Wooldridge, Multi-Agent Systems)

[1] https://amzn.eu/d/6a1KgnL

Here are Mike's credentials :https://www.cs.ox.ac.uk/people/michael.wooldridge/

dist-epoch1y ago

Real agents have never been tried

jokethrowaway1y ago· 4 in thread

I don't get the hype about Agents.

It's just calling a LLM n-times with slightly different prompts

Sure, you get the ability to correct previous mistakes, it's basically a custom chain of thought - but errors compound and the results coming from agents have a pretty low success rate.

Bruteforcing your way out of problems can work sometimes (as evinced by the latest o3 benchmarks) but it's expensive and rarely viable for production use.

grahamj1y ago

> It's just calling a LLM n-times with slightly different prompts

It can be, but ideally each agent’s model, prompts and tools are tailored to a particular knowledge domain. That way tasks can be broken down into subtasks which are classified and passed to the agents best suited to them.

Agree RE it being bruteforce and expensive but it does look like it can improve some aspects of LLM use.

retinaros1y ago

That is just like having a for loop per domain.

mindcrime1y ago

> It's just calling a LLM n-times with slightly different prompts

That's one way of building something you could call an "agent". It's far from the only way. It's certainly possible to build agents where the LLM plays a very small role, or even one that uses no LLM at all.

retinaros1y ago

Thats a workflow

ocean_moist1y ago· 3 in thread

Maybe I just don’t understand the article but I really have 0 clue how they go about making their conclusions and really don’t understand what they are saying.

I think the 5 issues they provide under “Cognitive Architectures” are severely underspecified to the point where they really don’t _mean_ anything. Because the issues are so underspeficifed I don’t know how their proposed solution solves their proposed problems. If I understand it correctly, they just want agents (Assistants/Agents) with user profiles (Sims) on an app store? I’m pretty sure this already exists on the ChatGPT store. (sims==memories/user profiles, agents==tools/plugins, assistants==chat interface)

This whole thing is so broad and full of academic (pejorative) platitudes that it’s practically meaningless to me. And of course although completely unrelated they through a reference into symbolic systems. Academic theater.

spiderfarmer1y ago

This is publishing for the sake of publishing.

sambo5461y ago

The general negativity toward agents makes it read like the problem section of a research proposal ("X isn't good enough, we're going to develop solution Y").

1 more reply

antisthenes1y ago

It's a 4-page paper trying to give a summary of 40+ years of research on AI.

Of course it's going to be vague and presumptuous. It's more of a high-level executive summary for tech-adjacent folks than an actual research paper.

danielmarkbruce1y ago· 3 in thread

Why post this paper? It says nothing, it's a waste of people's time to read.

duxup1y ago

Even just the definition of an Agent (maybe imperfect) made it worthwhile for me.

sgt1011y ago

Hi - have a look at this book if you are interested [1] (Mike Wooldridge, Multi-Agent Systems)

[1] https://amzn.eu/d/6a1KgnL

Here are Mike's credentials :https://www.cs.ox.ac.uk/people/michael.wooldridge/

danielmarkbruce1y ago

I'm not sure it's even good though... the input doesn't need to come from a user. I have an "agent" which listens for an event in financial markets and then goes and does some stuff.

In practice the current usage of "agent" is just: a program which does a task and uses an LLM somewhere to help make a decision as to what to do and maybe uses an LLM to help do it.

zombiwoof1y ago· 3 in thread

Agent is a funding and marketing term imho

Soon it will be AI Microservices

bad_haircut721y ago

Who wants to invest in my startup, its a Microagent service architectures orchestration platform. All you do is define the inputs, write the agents algorithms, apply agency by inputting a decision tree (ifs and conditionals) and then a function to format output! And the best part? You do all of it in YAML!

/sarcasm, hopefully obviously

mindcrime1y ago

I was thinking "shut up and take my money" until you brought YAML into it. Hard pass. ;p

ramesh311y ago

>Agent is a funding and marketing term imho

So was "mobile" 15 years ago. Companies are deploying hundreds of billions in capital for this. It's not going anywhere, and you'd be best off upskilling now instead of dismissing things.

ripped_britches1y ago· 2 in thread

I can imagine really powerful agents this year or next in theory. Agents meaning (not a thermostat) a system that can go complete some async tasks on your behalf. But in practice I don’t have any idea how we will solve for prompt injection attacks. Hopefully someone cracks it.

Jerrrry1y ago

  >solve for prompt injection attacks

It is essentially the same Code as Data problem as always.

cratermoon1y ago

"AI will soon be able too..."

nowittyusername1y ago· 1 in thread

With time, they will get a lot better. IMO, the biggest hurdles the agents currently lack is good implementation of function calling capabilities. LLM's should be used as reasoning engines and everything else should be offloaded to tool use. This will drastically reduce hallucinations and errors in math and all the other areas.

lionkor1y ago

Do they reason, though?

joshka1y ago· 1 in thread

https://www.arxiv.org/abs/2412.16241 is the non-pdf version of this @dang can you please replace the link?

dang1y ago

Ok! changed from https://www.arxiv.org/pdf/2412.16241.

beezle1y ago

For those who dont want to down load the PDF directly and prefer to start with the abstract: https://arxiv.org/abs/2412.16241

pwillia71y ago

How would the SIMS that contain the user prefs and whatnot not have the same issues described in the paper as the agents themselves?

cratermoon1y ago

Here's a link to arxiv page for the paper, in case you want to look over the abstract and citation metadata before downloading the PDF.

https://arxiv.org/abs/2412.16241

coro_11y ago

The paper covers technical details and the logistics of AI Agents to come. But how are humans going to react to mass AI Agents replacing other human emotion and connection? Bias is central in tech-culture to only agents, but this could become an issue.

rcarmo1y ago

This reads a lot like agents wearing different kinds of trenchcoats (or underwear). Can’t really see an evidence this separation works.

syntex1y ago

Why does this have so many upvotes? Is this the current state of research nowadays?

asciii1y ago

Diabolical - I love it. Impressed that the final score came up as an alert!

authorfly1y ago

Does anyone else get the sense that the definition has been bastardized by the conflation of the two concurrent previous uses of "agent"?

i.e. in AI, biology and informatics, "Agent" typically meant something: That had a form / self / embodiment. That could sense the environment and react to those perceptions. That possibly could learn, adapt, or change to various degrees of complexity, which would entail optionally being an "intelligent system".

Meanwhile in common parlance, Agent meant: Someone who acts or behaves on behalf of another adaptively to accomplish something with some degree of freedom.

And this might explain why so people say agent/agentic necessarily refers to "tool use" or "being able to overcome problems on the happy path" or "something capable of performing actions on an infinite loop while reacting" (the latter two in my opinion, conflates the meaning of "Intelligent system" or "Intelligent behavior"). Meanwhile, biologists might still reply to a single cell seemingly inert, or a group of bacteria in a colony, as an Agent (a more behaviouralist/chemical "look-deep-down" perspective)

I think a lot of disappointment is that biologists/OG AI enthusiasts are looking for something truly adaptive, sensing, able to behave, "live" indefinitely, have acquire or set goals, and which might be able to if intelligent, work with other agents to accomplish things (e.g. a "society"). Meanwhile, people who just want an "AI HR Agent" just want something that can communicate, interview, discern good applicants, and book the interviews plus provide summary notes. These two things are very different. But both, could use tools etc (the key difference from ChatGPT which is enabling this new concept to be more useful than ChatGPT, alongside various forms of short term memory rather than "fresh-every-time-conversations).

j451y ago

Math that can't be too warm and too accurate to work may have challenges being too accurate and reliable with repeating processes.

DebtDeflation1y ago

This whole idea of prompting an LLM and piping the output as the input (prompt) of another LLM and asking it to do something with it (like critique/edit it) and then piping the output of that LLM back to the first LLM along with instructions to keep repeating the process until some stop criteria is met seems to me to just be a money-making scheme to drive up token consumption.

1 more reply

j / k navigate · click thread line to collapse

156 comments

89 comments · 22 top-level

tonetegeatinst1y ago· 16 in thread

wkat42421y ago

Totally agree. An LLM won't be an AGI.

Terr_1y ago

> It's just that since the masses found that they can talk to an AI like a human

In a way it's worse: Even the "talking to" part is an illusion, and unfortunately a lot of technical people have trouble remembering it too.

> they think that it's got human capabilities too

Yeah, we easily confuse the character with the author. If I write an obviously-dumb algorithm which slaps together a story, it's still a dumb algorithm no matter how smart the robot in the story is.

2 more replies

lugu1y ago

1 more reply

daxfohl1y ago

conception1y ago

2 more replies

pton_xd1y ago

> However I would argue an LLM will never built a trained model like stockfish or deepseek.

It doesn't have to, the LLM just needs access to a computer. Then it can write the code for Stockfish and execute it. Or just download it, the same way you or I would.

> True AGI or SI would stop trying to be a grand master of everything but rather know what best method/model should be applied to a given problem.

Yep, but I don't see how that relates to LLMs not reaching AGI. They can already write basic Python scripts to answer questions, they just need (vastly) more advanced scripting capabilities.

lukeplato1y ago

I don't see why a mixture of experts couldn't be distilled into a single model and unified latent space

energy1231y ago

1 more reply

zaroth1y ago

Exactly what DeepSeek3 is doing.

phaedrus1y ago

vrighter1y ago

Upvoter331y ago

"If AGI ... is possible"

I don't get this line of thinking. AGI already exists - it's in our heads!

So then the question is: is what's in our heads magic, or can we build it? If you think it's magic, fine - no point arguing. But if not, we will build it one day.

jerojero1y ago

The brain is such an intractable web of connections that it has been really difficult to properly make sense of it.

1 more reply

seadan831y ago

trescenzi1y ago

GI is in our heads. The A is artificial which means built by humans. They are asking the same question you are.

1 more reply

nuancebydefault1y ago

Indeed! That's what I have been thinking for a while but I never had the occasion and or breath to write it down, and you explained it concisely. Finally some 'confirmation' 'bias'...

simonw1y ago· 14 in thread

This paper does at least lead with its version of what "agents" means (I get very frustrated when people talk about agents without clarifying which of the many potential definitions they are using):

This appears to be the broadest possible definition, encompassing thermostats all the way through to Waymos.

adpirz1y ago

You posted on X a while back asking for a crowdsourced definition of what an "agent" was and I regularly cite that thread as an example of the fact that this word is so blurry right now.

simonw1y ago

I really need to write that up in one place - closest I've got is this section from my 2024 review https://simonwillison.net/2024/Dec/31/llms-in-2024/#-agents-...

6 more replies

mindcrime1y ago

To illustrate: here's a paper from 1996 that tries to lay out a taxonomy of the different kinds of agents and provide some definitions:

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

And another from the same time-frame, which makes a similar effort:

https://www.researchgate.net/profile/Stan-Franklin/publicati...

1 more reply

williamcotton1y ago

So basically just the concept of feedback in a cybernetic system.

https://en.wikipedia.org/wiki/Cybernetics

bob10291y ago

> The field is named after an example of circular causal feedback—that of steering a ship (the ancient Greek κυβερνήτης (kybernḗtēs)...

Now that name makes a lot more sense to me.

1 more reply

8338550bff961y ago

Then is "agents" just non-spooky coded language for "cyborgs"

1 more reply

cratermoon1y ago

We're at the phase of the hype cycle where "agent" means whatever the marketing materials want it to mean.

behnamoh1y ago

mindcrime1y ago

> People have been talking about agents for at least 2 years.

https://en.wikipedia.org/wiki/Software_agent

> Agents are just LLMs with structured output

That's only true for the "LLM Agent" version. There are Agents that have nothing to do with LLM's at all.

1 more reply

curious_cat_1631y ago

Yes, and the definition works reasonably well for the core arguments they are making in Section 5.

I suspect they'll follow up with a full paper with more details (and artifacts) of their proposed approach.

htrp1y ago

agents are the 2020s version of data science in the 2010s

Kerbonut1y ago

Do you mean that agents are being hyped in the same way data science was in the 2010s, or that they’ll have a similar impact over time? Would love to hear more of your thoughts.

1 more reply

baxtr1y ago

To me "Agents" sound like computer programs that interact through APIs?

bsenftner1y ago

Oh come on! You and I know very well an AI Agent is anything marketing says they are, and that is the absolute final truth.

bob10291y ago· 9 in thread

barrkel1y ago

It's difficult to be precise. Often it's easier to gauge things by looking at them while giving motor feedback (e.g. turning a dial, pushing a slider) than to say "a little more X" or "a bit less Y".

LLMs are a good start and a tool we can integrate into systems. They're a long, long way short of what we need.

GiorgioG1y ago

re: LLM as UI: Given that I don't trust LLMs to be deterministic, I wouldn't trust them to make the correct API call every time I tell it to do X.

kgeist1y ago

- you ask LLM to build a workflow for your problem

- the LLM builds the workflow (macro) using predefined commands

- you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding

- you save the workflow and can use it without any LLM agents, just clicking a button - pretty determenistic and reliable

Advantages:

- reliable, deterministic

- you don't need to learn a product's UI, you just formulate your problem using natural language

5 more replies

hitchstory1y ago

I dont either, but this can be mitigated by adding guard rails (strictly validating input), double checking actions with the user and using it for tasks where a mistake isnt world ending.

Even then mistakes can slip through, but it could still be more reliable than a visual UI.

There are lots of horrible web UIs i would LOVE to replace with a conversational LLM agent. No #1 is jira and so is no #2 and #3.

deadbabe1y ago

They are deterministic at 0 temperature

5 more replies

pwillia71y ago

quick demo: https://youtu.be/2zvbvoRCmrE

diggan1y ago

> I think the goldilocks path is to make the user the agent and use the LLM simply as their UI/UX for working with the system

klabb31y ago

> "LLM as UI" seems to be something hanging pretty low on the tree of opportunity.

Yes if you want to annoy your users and deliberately put roadblocks to make progress on a task. Exhibit A: customer support. They put the LLM in between to waste your time. It’s not even a secret.

> Why spent months struggling with complex admin dashboard layouts

Overuse of web frameworks has completely different causes than ”I need a functional thing” and thus it cannot be solved with a different layer of tech like LLMs, NFTs or big data.

wkat42421y ago

In this case the fault is not with the LLM but with the people that put it there.

georgestrakhov1y ago· 6 in thread

IMHO, the word agent is quickly becoming meaningless. The amount of agency that sits with the program vs. the user is something that changes gradually.

So we should think about these things in terms of how much agency are we willing to give away in each case and for what gain[1].

[1] https://essays.georgestrakhov.com/artificial-agency-ladder/

HarHarVeryFunny1y ago

Hugging Face have their own definitions of a few different types of agent/agentic system here:

https://huggingface.co/docs/smolagents/en/conceptual_guides/...

khafra1y ago

I think this is a useful definition. It admits a continuum of agency, just like the huggingface link; but it also allows us to distinguish between a kid on a sled, and a rock rolling downhill.

https://www.alignmentforum.org/tag/agent-foundations has some justification and further elaboration.

1 more reply

sgt1011y ago

Hi - have a look at this book if you are interested [1] (Mike Wooldridge, Multi-Agent Systems)

[1] https://amzn.eu/d/6a1KgnL

Here are Mike's credentials :https://www.cs.ox.ac.uk/people/michael.wooldridge/

w10-11y ago

> IMHO, the word agent is quickly becoming meaningless. The amount of agency that sits with the program vs. the user is something that changes gradually

Yes, the term is becoming ambiguous, but that's because it's abstracting out the part of AI that is most important and activating: the ability to work both independently and per intention/need.

Yes, "the ecosystem will evolve," but to understand and anticipate the evolution, one needs a notion of fitness, which is based on agency.

> So we should think about these things in terms of how much agency are we willing to give away in each case

paulryanrogers1y ago

> Manufacturing went to China against the wishes of most everyone involved

AI isn't happening in a vacuum. Shareholders and customers are buying it.

rcarmo1y ago

TaurenHunter1y ago· 5 in thread

"More Agents is all you need" https://arxiv.org/abs/2402.05120

I could not find a "Agents considered harmful" related to AI, but there is this one: "AgentHarm: A benchmark for measuring harmfulness of LLM agents" https://arxiv.org/pdf/2410.09024

This "Agents considered harmful" is not AI-related: https://www.scribd.com/document/361564026/Math-works-09

ksplicer1y ago

When reading anthropics blog on agents I basically took away that their advice is you shouldn't use them to solve most problems.

https://www.anthropic.com/research/building-effective-agents

"For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough."

retinaros1y ago

However The state of agents slightly changed and while we had 25% accuracy in multiturn conversations we re now at 50.

kridsdale11y ago

Morpheus taught me they are quite harmful.

sgt1011y ago

Hi - have a look at this book if you are interested [1] (Mike Wooldridge, Multi-Agent Systems)

[1] https://amzn.eu/d/6a1KgnL

Here are Mike's credentials :https://www.cs.ox.ac.uk/people/michael.wooldridge/

dist-epoch1y ago

Real agents have never been tried

jokethrowaway1y ago· 4 in thread

I don't get the hype about Agents.

It's just calling a LLM n-times with slightly different prompts

Sure, you get the ability to correct previous mistakes, it's basically a custom chain of thought - but errors compound and the results coming from agents have a pretty low success rate.

Bruteforcing your way out of problems can work sometimes (as evinced by the latest o3 benchmarks) but it's expensive and rarely viable for production use.

grahamj1y ago

> It's just calling a LLM n-times with slightly different prompts

Agree RE it being bruteforce and expensive but it does look like it can improve some aspects of LLM use.

retinaros1y ago

That is just like having a for loop per domain.

mindcrime1y ago

> It's just calling a LLM n-times with slightly different prompts

retinaros1y ago

Thats a workflow

ocean_moist1y ago· 3 in thread

Maybe I just don’t understand the article but I really have 0 clue how they go about making their conclusions and really don’t understand what they are saying.

spiderfarmer1y ago

This is publishing for the sake of publishing.

sambo5461y ago

The general negativity toward agents makes it read like the problem section of a research proposal ("X isn't good enough, we're going to develop solution Y").

1 more reply

antisthenes1y ago

It's a 4-page paper trying to give a summary of 40+ years of research on AI.

Of course it's going to be vague and presumptuous. It's more of a high-level executive summary for tech-adjacent folks than an actual research paper.

danielmarkbruce1y ago· 3 in thread

Why post this paper? It says nothing, it's a waste of people's time to read.

duxup1y ago

Even just the definition of an Agent (maybe imperfect) made it worthwhile for me.

sgt1011y ago

Hi - have a look at this book if you are interested [1] (Mike Wooldridge, Multi-Agent Systems)

[1] https://amzn.eu/d/6a1KgnL

Here are Mike's credentials :https://www.cs.ox.ac.uk/people/michael.wooldridge/

danielmarkbruce1y ago

I'm not sure it's even good though... the input doesn't need to come from a user. I have an "agent" which listens for an event in financial markets and then goes and does some stuff.

In practice the current usage of "agent" is just: a program which does a task and uses an LLM somewhere to help make a decision as to what to do and maybe uses an LLM to help do it.

zombiwoof1y ago· 3 in thread

Agent is a funding and marketing term imho

Soon it will be AI Microservices

bad_haircut721y ago

/sarcasm, hopefully obviously

mindcrime1y ago

I was thinking "shut up and take my money" until you brought YAML into it. Hard pass. ;p

ramesh311y ago

>Agent is a funding and marketing term imho

So was "mobile" 15 years ago. Companies are deploying hundreds of billions in capital for this. It's not going anywhere, and you'd be best off upskilling now instead of dismissing things.

ripped_britches1y ago· 2 in thread

Jerrrry1y ago

  >solve for prompt injection attacks

It is essentially the same Code as Data problem as always.

cratermoon1y ago

"AI will soon be able too..."

nowittyusername1y ago· 1 in thread

lionkor1y ago

Do they reason, though?

joshka1y ago· 1 in thread

https://www.arxiv.org/abs/2412.16241 is the non-pdf version of this @dang can you please replace the link?

dang1y ago

Ok! changed from https://www.arxiv.org/pdf/2412.16241.

beezle1y ago

For those who dont want to down load the PDF directly and prefer to start with the abstract: https://arxiv.org/abs/2412.16241

pwillia71y ago

How would the SIMS that contain the user prefs and whatnot not have the same issues described in the paper as the agents themselves?

cratermoon1y ago

Here's a link to arxiv page for the paper, in case you want to look over the abstract and citation metadata before downloading the PDF.

https://arxiv.org/abs/2412.16241

coro_11y ago

rcarmo1y ago

This reads a lot like agents wearing different kinds of trenchcoats (or underwear). Can’t really see an evidence this separation works.

syntex1y ago

Why does this have so many upvotes? Is this the current state of research nowadays?

asciii1y ago

Diabolical - I love it. Impressed that the final score came up as an alert!

authorfly1y ago

Does anyone else get the sense that the definition has been bastardized by the conflation of the two concurrent previous uses of "agent"?

Meanwhile in common parlance, Agent meant: Someone who acts or behaves on behalf of another adaptively to accomplish something with some degree of freedom.

j451y ago

Math that can't be too warm and too accurate to work may have challenges being too accurate and reliable with repeating processes.

DebtDeflation1y ago

1 more reply

j / k navigate · click thread line to collapse