It would do things like Google something, the result wasn’t relevant , try again , got an error from one of the pages and then seemingly started to do something completely incoherent related to the error message
Right now we have:
Step 1: Human reasoning, tool use, input.
Step 2: LLM output.
Step 3: Human reasoning, tool use, input.
Step 4: LLM output.
&etc.
The observation that the input and output are both just text makes it possible to make "agents". But the "agent" movement trying to totally close the whole loop right away is way too early.
It's fine to lay the groundwork though, and the frameworks for it, like AutoGPT, can be used to just do a couple extra steps rather than close the whole loop.
Plugins and browsing can be seen as merging some of step 2 and 3. But then you still need the &etc iteration with the human closely in the loop.
Chain of thought prompting techniques are similarly an attempt to merge a little bit of the human's process of vetting the output by trying to get better output in individual iterations. Sometimes I make the LLM output multiple options and pick the best one with its reasoning; this is really just compressing multiple runs of the LLM and having it pick one, rather than me retrying if I get a bad output.
Anyway I think this is the right way to look at it; these are good tools for trying to compress iterations of human-in-the-loop. For some things maybe we'll eventually remove the human, but we shouldn't expect it right now. The twitter demonstrations of "it did the whole thing" are a trick; good for influences, but not realistic right now.
For example, when asked to search for the top executives at company X, it rightly uses Google Search with the query “top executives at company X,” which returns a list of web pages such as the company’s About page. It then parses the About page but because of messed up page formatting, it returns nonsense data like the LinkedIn profile URL and some marketing material like a case study link, even though the executive profiles are right there.
The google search function is also limited. For comparison, SerpAPI masterfully scrapes Google Search using a proxy network and very intelligent parsing. In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.
Fortunately, a lot of people are contributing to AutoGPT now and it is improving quickly. They are revamping the core right now and I expect it will work far better when they are done. With time, better tools will be made available to GPT-4 and progress should then be faster.
Thanks for your kind words. We are working on SerpApi integration for Auto-GPT: https://github.com/serpapi/public-roadmap/issues/905
would love to see how you implemented this with guidance. Did you use GPT4?
I tried doing more complex tasks using GPT4 and was initially optimistic about plugins but they have all been very disappointing.
For instance, a dream for me would be something like: "Find some rental property opportunities within 1 hour commute to New York City, that have a high rent to sale price and low taxes"
Broken down into steps it would be:
1. Find towns within 50 miles or so from within Manhattan. Take the top 100 or so by population
2. Find commute times for each one leaving at 9am monday and coming back at 6pm. Narrow down the cities to 1 hours. Unfortunately I didn't see any map plugins but maybe something like wolfram alpha can suffice or just google commute time for each town
3. Use zillow to pull typical rents and sale prices for each town. build a simple model (maybe wolfram) to model the rent and apply them to homes for sale including taxes. calculate median expected rent / median sale price
4. Remove towns that you don't have enough data on (not enough rentals or homes for sale) and return the top towns with a few examples of how much you can get
If I were building something like autogpt, I would start with an example like this and use this almost like an integration test. Theoretically all the pieces are there, but it just falls apart very quickly. I've heard these models can't yet do "planning" and i'm not sure what that means technically but I think this kind of problem requires planning so it might be a model limitation
It was my hypothesis that the variety of trash returned by things like serpapi need to be massaged into something consistent and potentially run through a result retrieval and fine tuning stage to be useful to a high level agent like autogpt, but didn't make it far enough to have anything working to show.
here's the video: https://www.loom.com/share/5e83475be2464778950f7df7e209ac2d
basically, this effort ended up failing because, well, problem solving itself is inherently complex.
https://en.wikipedia.org/wiki/Fifth-generation_programming_l...
seems like the exact same thing happened with ChatGPT / AutoGPT / GPT4, and this will keep happening.
A bit early to call it, by far.
Serious development around these capabilities have only just gotten off the ground.
It also doesn't seem like LLMs are done improving.
I was thinking that perhaps we have been working with abstractions that are too low-level. Instead of providing a set of tools such as API calls or text splitters, wouldn't it be more reliable to give agents templates or workflows of successful tasks, such as trimming videos or booking restaurants?
These templates would consist of a set of function calls, or a graph of connected components in low-code tools like LangFlow. I believe auto agents already use a similar concept where they cache successful tasks for future reuse. The idea is to populate these caches with the most common use cases, and use retrieval if they become too large, so that we don't experience cache-miss most of the time and work with lower-level abstractions (tools) as the baseline. Templates, like prompts, should be portable (e.g. JSON) to avoid the need for everyone to reinvent the wheel. While this solution may not be as impressive as a full autonomous agent and may not work for a generalized case, it should produce a more predictable outcome, I think.
It's really turtles all the way down.