Not at all what one-shot means in the field. Zero-shot, one-shot and many-shot means how many examples at inference time are needed to perform a task
Zero shot: "convert these files from csv to json"
One shot: "convert from csv to json, like "id,name,age/n1,john,20" to {id:"1",name:"tom",age:"20"}
This is probably a case where some educational training could have saved the engineer(s) involved a lot of frustration.
at the end its fine if the agent self corrects amongst many shots too
This is the largest issue : using LLMs as a black box means for most goals, we can't rely on them to always "converge to a solution" because they might get stuck in a loop trying to figure out if they're stuck in a loop.
So then we're back to writing in a hardcoded or deterministic cap on how many iterations counts as being "stuck". I'm curious how the authors solve this.
This is what I’ve done working with smaller model: if it fails validation once, I route it to a stronger model just for that tool call.
the problem the GP was referring to is that even the large model might fail to notice it's struggling to solve a task and keep trying more-or-less the same approaches until the loop is exhausted.
EDIT: not as to creating an agent that can do anything but creating an agent that more reliably represents and respects its reality, making it easier for us to reason and work with seriously.
Because here I'm getting "YouTuber thumbnail vibes" at the idea of solving non-deterministic programming by selecting the one halting outcome out of a multiverse of possibilities
This issue will likely always require a monitor “outside” of the agent.
Curious what folks are seeing in terms of consistency of the agents they are building or working with – it's definitely challenging.
Sounds tautological but you want to get as far as possible with the one-shot before iterating, because one-shot is when the results have the most integrity