undefined | Better HN

0 pointsmhog_hn8mo ago0 comments

An agent is an LLM + a tool call loop - it is quite a step up in terms of value in my experience

0 comments

Agents are more than that.

Agents, besides tool use, also have memory, can plan work towards a goal, and can, through an iterative process (Reflect - Act), validate if they are on the right track.

ivape8mo ago

If an agent takes a Topic A and goes down a rabbit hole all the way to Topic Z, you'll see that it won't be able to incorporate or backtrack back to Topic A without losing a lot of detail from the trek down to Topic Z. It's a serious limitation right now from the application development side of things, but I'm just reiterating what the article pointed out, which is that you need to work with fewer step workflows that isn't as ambitious as covering all things from A-Z.

jsemrau8mo ago

Yes, that's commonly referred to as the Exploration-Exploitation Dilemma. Should the agent go deep or wide?

https://en.wikipedia.org/wiki/Exploration%E2%80%93exploitati...

infecto8mo ago

Not a disagreement with you but wanted to further clarify.

I do think it’s a step up when done correctly. Thinking of tools like Cursor. Most of my concern and issue comes from the amount of folks I have seen trying to great a system that solves everything. I know in my org people were working on Agents without even a problem they were solving for. They are effectively trying to recreate ChatGPT which to me is a fools errand.

ethbr18mo ago

I’d boil it down thusly:

What do agents provide? Asynchronous work output, decoupled from human time.

That’s super valuable in a lot of use cases! Especially because it’s a prerequisite for parallelizing “AI” use (1 human : many AI).

But the key insight from TFA (which I 100% agree with) is that the tyranny of sub-100% reliability compounded across multiple independent steps is brutal.

Practical agent folks should be engineering risk / reliability, instead of happy path.

And there are patterns and approaches to do that (bounded inputs, pre-classification into workable / not-workable, human in the loop), but many teams aren’t looking at the right problem (risk/reliability) and therefore aren’t architecting to those methods.

And there’s fundamentally no way to compose 2 sequential 99% reliable steps into a 99% reliable system with a risk-naive approach.

johnisgood8mo ago

What is the use case? What does it solve exactly, or what practical value does it give you? I am not sure what a tool call loop is.

queenkjuul8mo ago

An example:

I updated a svelte component at work, and while i could test it in the browser and see it worked fine, the existing unit test suddenly started failing. I spent about an hour trying to figure out why the results logged in the test didn't match the results in the browser.

I got frustrated, gave in and asked Claude Code, an AI agent. The tool call loop is something like: it reads my code, then looks up the documentation, then proposed a change to the test which i approve, then it re-runs the test, feeds the output back into the AI, re-checks the documentation, and then proposes another change.

It's all quite impressive, or it would be if at one point it didn't randomly say "we fixed it! The first element is now active" -- except it wasn't, Claude thought the first element was element [1], when of course the first element in an array is [0]. The test hadn't even actually passed.

An hour and a few thousand Claude tokens my company paid for and got nothing back for lol.

apwell238mo ago

any examples outside of coding agents ?

Even in this example coding agent is short lived . I am curious about continuously running agents that are never done.

2 more replies

kro8mo ago

The tools can be an editor/terminal/dev environment, automatically iterating to testing the changes and refining until a finished product, without a human developer, at least that is what some wish of it.

johnisgood8mo ago

Oh, okay, I understand it now, especially with the other comment that said Cursor is one. OK, makes sense. Seems like it "just" reduces friction (quite a lot).

1 more reply

ghuntley8mo ago

> I am not sure what a tool call loop is.

See https://ampcode.com/how-to-build-an-agent

holler8mo ago

that was a great read, thanks! - agentic noob

infecto8mo ago

Cursor is my classic example. I don’t know exactly what tools are defined in their loop but you give the agent some code to write. It may search your code base, it may then search online for third party library docs. Then come back and write some code etc.

jsemrau8mo ago

If it were only tool use, then it would be the same as a lambda function.

j / k navigate · click thread line to collapse

0 comments

jsemrau8mo ago

Agents are more than that.

Agents, besides tool use, also have memory, can plan work towards a goal, and can, through an iterative process (Reflect - Act), validate if they are on the right track.

ivape8mo ago

jsemrau8mo ago

Yes, that's commonly referred to as the Exploration-Exploitation Dilemma. Should the agent go deep or wide?

https://en.wikipedia.org/wiki/Exploration%E2%80%93exploitati...

infecto8mo ago

Not a disagreement with you but wanted to further clarify.

ethbr18mo ago

I’d boil it down thusly:

What do agents provide? Asynchronous work output, decoupled from human time.

That’s super valuable in a lot of use cases! Especially because it’s a prerequisite for parallelizing “AI” use (1 human : many AI).

But the key insight from TFA (which I 100% agree with) is that the tyranny of sub-100% reliability compounded across multiple independent steps is brutal.

Practical agent folks should be engineering risk / reliability, instead of happy path.

And there’s fundamentally no way to compose 2 sequential 99% reliable steps into a 99% reliable system with a risk-naive approach.

johnisgood8mo ago

What is the use case? What does it solve exactly, or what practical value does it give you? I am not sure what a tool call loop is.

queenkjuul8mo ago

An example:

An hour and a few thousand Claude tokens my company paid for and got nothing back for lol.

apwell238mo ago

any examples outside of coding agents ?

Even in this example coding agent is short lived . I am curious about continuously running agents that are never done.

2 more replies

kro8mo ago

johnisgood8mo ago

Oh, okay, I understand it now, especially with the other comment that said Cursor is one. OK, makes sense. Seems like it "just" reduces friction (quite a lot).

1 more reply

ghuntley8mo ago

> I am not sure what a tool call loop is.

See https://ampcode.com/how-to-build-an-agent

holler8mo ago

that was a great read, thanks! - agentic noob

infecto8mo ago

jsemrau8mo ago

If it were only tool use, then it would be the same as a lambda function.

j / k navigate · click thread line to collapse