Principles for Building One-Shot AI Agents (opens in new tab)

(edgebit.io)

93 pointsrobszumski1y ago31 comments

31 comments

What is a “one-shot” AI Agent? A one-shot AI agent enables automated execution of a complex task without a human in the loop.

Not at all what one-shot means in the field. Zero-shot, one-shot and many-shot means how many examples at inference time are needed to perform a task

Zero shot: "convert these files from csv to json"

One shot: "convert from csv to json, like "id,name,age/n1,john,20" to {id:"1",name:"tom",age:"20"}

devmor1y ago

Given the misunderstandings and explanation of how they struggled with a long-solved ml problem, I believe this article was likely written by someone without much formal experience in AI.

This is probably a case where some educational training could have saved the engineer(s) involved a lot of frustration.

zavec1y ago

As a casual ML non-practicioner, what was the long-solved ML problem they ran up against?

devmor1y ago

Both “Principle 1” and “Principle 2” in the article are essentially LLM-focused details of basic principles in ML that have been known since before I (and probably you, if you’re still working age) were born.

robszumskiOP1y ago

Fair criticism. I was going for the colloquial usage of "you get one shot" but yeah I did read that Google paper the other day referring to these as zero-shot.

tough1y ago

fully-autonomous makes more sense in the agentic vicab imho

at the end its fine if the agent self corrects amongst many shots too

sebastiennight1y ago

> A different type of hard failure is when we detect that we’ll never reach our overall goal. This requires a goal that can be programmatically verified outside of the LLM.

This is the largest issue : using LLMs as a black box means for most goals, we can't rely on them to always "converge to a solution" because they might get stuck in a loop trying to figure out if they're stuck in a loop.

So then we're back to writing in a hardcoded or deterministic cap on how many iterations counts as being "stuck". I'm curious how the authors solve this.

NoTeslaThrow1y ago

Surely the major issue is thinking you've converged when you haven't. If you're unsure if you've converged you can just bail after n iterations and say "failed to converge".

bhl1y ago

Just give your tool call loop to a stronger model to check if it’s a loop.

This is what I’ve done working with smaller model: if it fails validation once, I route it to a stronger model just for that tool call.

behnamoh1y ago

> if it fails validation once, I route it to a stronger model just for that tool call.

the problem the GP was referring to is that even the large model might fail to notice it's struggling to solve a task and keep trying more-or-less the same approaches until the loop is exhausted.

sebastiennight1y ago

Exactly. You'd still be in a non-deterministic loop, just a more expensive one.

1 more reply

randysalami1y ago

I think we need quantum systems to ever break out of that issue.

EDIT: not as to creating an agent that can do anything but creating an agent that more reliably represents and respects its reality, making it easier for us to reason and work with seriously.

sebastiennight1y ago

Could you share the logic behind that statement?

Because here I'm getting "YouTuber thumbnail vibes" at the idea of solving non-deterministic programming by selecting the one halting outcome out of a multiverse of possibilities

dullcrisp1y ago

ELI40 “YouTuber thumbnail vibes?”

2 more replies

randysalami1y ago

That would be some Dr. Strange stuff. I’m just saying a quantum AI agent would be more grounded when deciding when to stop based on the physical nature of their computation vs. engineering hacks we need for current classical systems that become inherently inaccurate representations of reality. I could be wrong.

1 more reply

devmor1y ago

I don’t believe quantum computers can solve the halting problem, so I don’t think that would actually help.

This issue will likely always require a monitor “outside” of the agent.

randysalami1y ago

I think you’re right that they can’t “solve” the halting problem but are more capable at dealing with it than classic ai agents and more physically grounded. Outside monitoring would be required but I’d imagine less so than classical systems and in physically different ways; and to be fair, humans require monitoring too if they should halt or not, haha.

1 more reply

robszumskiOP1y ago

Author of the post, love to see this here.

Curious what folks are seeing in terms of consistency of the agents they are building or working with – it's definitely challenging.

lerp-io1y ago

u can’t one shot anything, you have to iterate many many times.

canadiantim1y ago

You one-shot it, then you iterate.

Sounds tautological but you want to get as far as possible with the one-shot before iterating, because one-shot is when the results have the most integrity

tough1y ago

tighter feedback loops the closer your shotting instances are to each other

j / k navigate · click thread line to collapse

31 comments

TZubiri1y ago

What is a “one-shot” AI Agent? A one-shot AI agent enables automated execution of a complex task without a human in the loop.

Not at all what one-shot means in the field. Zero-shot, one-shot and many-shot means how many examples at inference time are needed to perform a task

Zero shot: "convert these files from csv to json"

One shot: "convert from csv to json, like "id,name,age/n1,john,20" to {id:"1",name:"tom",age:"20"}

devmor1y ago

Given the misunderstandings and explanation of how they struggled with a long-solved ml problem, I believe this article was likely written by someone without much formal experience in AI.

This is probably a case where some educational training could have saved the engineer(s) involved a lot of frustration.

zavec1y ago

As a casual ML non-practicioner, what was the long-solved ML problem they ran up against?

devmor1y ago

robszumskiOP1y ago

Fair criticism. I was going for the colloquial usage of "you get one shot" but yeah I did read that Google paper the other day referring to these as zero-shot.

tough1y ago

fully-autonomous makes more sense in the agentic vicab imho

at the end its fine if the agent self corrects amongst many shots too

sebastiennight1y ago

> A different type of hard failure is when we detect that we’ll never reach our overall goal. This requires a goal that can be programmatically verified outside of the LLM.

So then we're back to writing in a hardcoded or deterministic cap on how many iterations counts as being "stuck". I'm curious how the authors solve this.

NoTeslaThrow1y ago

Surely the major issue is thinking you've converged when you haven't. If you're unsure if you've converged you can just bail after n iterations and say "failed to converge".

bhl1y ago

Just give your tool call loop to a stronger model to check if it’s a loop.

This is what I’ve done working with smaller model: if it fails validation once, I route it to a stronger model just for that tool call.

behnamoh1y ago

> if it fails validation once, I route it to a stronger model just for that tool call.

the problem the GP was referring to is that even the large model might fail to notice it's struggling to solve a task and keep trying more-or-less the same approaches until the loop is exhausted.

sebastiennight1y ago

Exactly. You'd still be in a non-deterministic loop, just a more expensive one.

1 more reply

randysalami1y ago

I think we need quantum systems to ever break out of that issue.

EDIT: not as to creating an agent that can do anything but creating an agent that more reliably represents and respects its reality, making it easier for us to reason and work with seriously.

sebastiennight1y ago

Could you share the logic behind that statement?

Because here I'm getting "YouTuber thumbnail vibes" at the idea of solving non-deterministic programming by selecting the one halting outcome out of a multiverse of possibilities

dullcrisp1y ago

ELI40 “YouTuber thumbnail vibes?”

2 more replies

randysalami1y ago

1 more reply

devmor1y ago

I don’t believe quantum computers can solve the halting problem, so I don’t think that would actually help.

This issue will likely always require a monitor “outside” of the agent.

randysalami1y ago

1 more reply

robszumskiOP1y ago

Author of the post, love to see this here.

Curious what folks are seeing in terms of consistency of the agents they are building or working with – it's definitely challenging.

lerp-io1y ago

u can’t one shot anything, you have to iterate many many times.

canadiantim1y ago

You one-shot it, then you iterate.

Sounds tautological but you want to get as far as possible with the one-shot before iterating, because one-shot is when the results have the most integrity

tough1y ago

tighter feedback loops the closer your shotting instances are to each other

j / k navigate · click thread line to collapse