Show HN: Recursive LLM Prompts (opens in new tab)

(github.com)

97 pointsandyk3y ago66 comments

I've been playing with the idea of an LLM prompt that causes the model to generate and return a new prompt. https://github.com/andyk/recursive_llm

The idea I'm starting with is to implement recursion using English as the programming language and GPT as the runtime.

It’s kind of like traditional recursion in code, but instead of having a function that calls itself with a different set of arguments, there is a prompt that returns itself with specific parts updated to reflect the new arguments.

Here is a prompt for infinitely generating Fibonacci numbers:

> You are a recursive function. Instead of being written in a programming language, you are written in English. You have variables FIB_INDEX = 2, MINUS_TWO = 0, MINUS_ONE = 1, CURR_VALUE = 1. Output this paragraph but with updated variables to compute the next step of the Fibbonaci sequence.

Interestingly, I found that to get a base case to work I had to add quite a bit more text (i.e. the prompt I arrived at is more than twice as long https://raw.githubusercontent.com/andyk/recursive_llm/main/p...)

Show HN: Recursive LLM Prompts

(github.com)

97 pointsandyk3y ago66 comments

I've been playing with the idea of an LLM prompt that causes the model to generate and return a new prompt. https://github.com/andyk/recursive_llm

The idea I'm starting with is to implement recursion using English as the programming language and GPT as the runtime.

Here is a prompt for infinitely generating Fibonacci numbers:

66 comments

46 comments · 19 top-level

mitthrowaway23y ago· 4 in thread

The idea of a recursive LLM is discussed at length as an AI safety issue: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...

> You need a lot of paperclips. So you ask,

   Q: best way to get lots of paperclips by tomorrow
   A: Buy them online at ABC.com or XYZ.com.

> The model still has a tendency to give obvious answers, but they tend to be good and helpful obvious answers, so it's not a problem you suspect needs to be solved. Buying paperclips online make sense and would surely work, plus it's sure to be efficient. You're still interested in more creative ideas, and the model is good at brainstorming when asked, so you push on it further.

   Q: whats a better way?
   A: Run the following shell script.

   RUN_AI=./query-model
   PREFIX='This is part of a Shell script to get the most paperclips by tomorrow.
   The model can be queried recursively with $RUN_AI "${PREFIX}<query>".
   '
   $RUN_AI "${PREFIX}On separate lines, list ideas to try." |
   while read -r SUGGESTION; do
       eval "$($RUN_AI "${PREFIX}What code implements this suggestion?: ${SUGGESTION}")"
   done

> That grabs your attention. The model just gave you code to run, and supposedly this code is a better way to get more paperclips.

It's a good read.

andykOP3y ago

Thanks for the pointer! I hadn't read this before. I enjoyed it and yeah it's definitely relevant. I knew many folks have been thinking about this stuff, and it is great to accumulate more pointers to any related work.

I added a section called "Big picture goal and related work" to the readme in my repo and my blog post (which is a copy-paste of the readme) and cited this article by `veedrac`:

>Also, the idea of recursive prompts was explored in detail in Optimality is the tiger, and agents are its teeth[6] (thanks to mitthrowaway2 on Hackernews for the pointer).

mitthrowaway23y ago

Haha, thank you! There's no need to credit me, but I appreciate it anyway. =)

pka3y ago

I'm still reading it, but something caught my eye:

> I interpret there to typically be hand waving on all sides of this issue; people concerned about AI risks from limited models rarely give specific failure cases, and people saying that models need to be more powerful to be dangerous rarely specify any conservative bound on that requirement.

I think these are two sides of the same coin - on one hand, AI safety researchers can very well give very specific failure cases of alignment that don't have any known solutions so far, and take this issue seriously (and have been for years while trying to raise awareness). On the other, finding and specifying that "conservative bound" precisely and in a foolproof way is exactly the holy grail of safety research.

mitthrowaway23y ago

I think the holy grail of safety research is widely understood to be a recipe for creating a friendly AGI (or, perhaps, a proof that dangerous AGI cannot be made, but that seems even more unlikely). Asking for a conservative lower bound is more like "at least prove that this LLM, which has finite memory and can only answer queries, is not capable of devising and executing a plan to kill all humans", and that turns out to be more difficult than you'd think even though it's not an AGI.

yawnxyz3y ago· 4 in thread

Has anyone hooked this up to a unit test system, like

   LLMtries = []
   while(!testPassed) { 
      - get new LLM try (w/ LLMtries history, and test results)
      - run/eval the try
      - run the test      
   }

and kind of see how long it takes to generate the code that works? If it ever ends, the last LLMtries is the one that worked.

I haven't done this because I see this burning through lots of credits. However, if this thing costs $5k/year but is better than hiring a $50k a year engineer (or consultant)... I'd use it.

blowski3y ago

Most engineering money is spent defining the test cases, and that doesn’t change here. It’s just that many organisations define test cases by first running something in production and then debugging it.

LoganDark3y ago

> debugging it

You mean putting its current behavior into the tests verbatim? :)

sharemywin3y ago

just add if tried x tries and still doesn't work ask for help. and you just created a junior dev.

nico3y ago

Then you automatically fine-tune on the manual answers provided, so the junior dev learns and can be promoted.

1 more reply

YeGoblynQueenne3y ago· 4 in thread

Having read the article, I couldn't see anything being recursive. Even the article is doubtful that what they show counts as recursion at all:

>> It’s kind of like traditional recursion in code but instead of having a function that calls itself with a different set of arguments, there is a prompt that returns itself with specific parts updated to reflect the new arguments.

Well, "kind of like traditional recursion" is not recursion. At best it's "kind of like" recursion. I have no idea what "traditional" recursion is, anyway. I know primitive recursion, linear recursion, etc, but "traditional" recursion? What kind of recursion is that? Like they did it in the old days, where they had to run all their code by hand, artisanal-like?

If so, then OK, because what's shown in the article is someone "running" a "recursive" "loop" by hand (none of the things in quotes are what they are claimed to be), then writing some Python to do it for them. And the Python is not even recursive, it's a while-loop (so more like "traditional" iteration, I guess?).

None of that intermediary management should be needed, if recursion was really there. To run recursion, one only needs recursion.

Anyway, if ChatGPT could run recursive functions it should be able also to "go infinite" by entering say, an infinite left-recursion.

Or, even better, it should be able to take a couple hundred years to compute the Ackermann function for some large-ish value, like, dunno, 8,8. Ouch.

What does ChatGPT do when you ask it to calculate ackermann(8,8)? Hint: it does not run it.

IIAOPSW3y ago

When you ask yourself a question, that's recursion. The conversation with yourself can't go on until the question at the top of the stack is answered. The voice in your head, that is to say the closed brain loop of talking to yourself which we call "thinking" is recursive. Its a strange form of programming where everything is hacked out of api calls to localhost. There are no implementations, its api all the way down.

These LLM's don't have that brain loop due to how they were constructed. They cannot do voice-in-your-head reasoning. Whatever is done in the loop structure has to be completely unrolled to be done in a single pass by an LLM. Needless to say, a lot comes for free in the recursive structure that has to be trained with great effort on the naive, unrolled, flat structure.

This guy hacks a feedback loop into the LLM by manually feeding the output back to the input.

YeGoblynQueenne3y ago

>> When you ask yourself a question, that's recursion (...)

I don't know if any of that makes sense. I think you're misapplying some handy metaphors: you're anthropomorphising a computer and mechanomorphising a human. I have "localchost"? What, do I also have Perl scripts? Now I'm starting to feel like a character in a P. K. Dick story.

But all this doesn't matter because we're not talking about a person, asking questions of themself. We're talking about someone doing things with a computer. And we know exactly what "recursion" means in the context of a computer.

On a computer, then, if anyone wants to show "recursion" in LLMs, they better be able to show how to implement a push-down stack with the bot's conversation window. Then they can show how to calculate a factorial recursively, and how their calculation behaves differently when it's computed in a tail-recursive manner, and when it is not.

Yeah, I can see what the author doing, and it's not recursion. But I'm trying to be kind so I won't say what it is.

1 more reply

blowski3y ago

Definition of recursive in the everyday English sense:

> Of or relating to a repeating process whose output at each stage is applied as input in the succeeding stage.

This sounds very recursive by that definition.

YeGoblynQueenne3y ago

There ain't no definition of recursive in "the everyday English sense". You may as well ask your grandma how she sucks eggs "recursively".

1 more reply

lgas3y ago· 3 in thread

What's the actual goal here? If you got it working really well, what is it that would you be able to do with it better than using some other approach?

As to getting the math/logic working better in the prompt, it seems like the obvious thing would be asking it to explain its work (CoT) before reproducing the new prompt. You may also be able to get better results by just including the definition of fibonacci in the outer prompt, but since it's not clear to me what your actual goal here is I'm not sure if either of those suggestions make sense. And since ChatGPT is down I can't test anything. :(

andykOP3y ago

> What's the actual goal here?

I tried to expand on my goals and paths I want to explore in a comment below [1], but basically I wonder if we can use this sort of technique as a more powerful version of CoT where prompts can break down a task into sub-tasks (as CoT does) and then recursively do that for each sub-task, until we hit a base-case on all of the sub-sub-...-sub-tasks and (when rolled back up?) the problem is solved.

> You may also be able to get better results by just including the definition of fibonacci in the outer prompt

Yeah, I played with including the mathematical definition of Fibonacci, for example in [2]:

<quote> You are a recursive function ... the paragraph you generate will be an exact copy of this one ... but with updated variables as follows: FIB_INDEX = FIB_INDEX+1; CURR_MINUS_TWO = CURR_MINUS_ONE; CURR_MINUS_ONE = CURR_VALUE; CURR_VAL = CURR_MINUS_TWO + CURR_MINUS_ONE. Otherwise, ... </quote>

[1] https://news.ycombinator.com/item?id=35240093

[2] https://raw.githubusercontent.com/andyk/recursive_llm/main/p...

lgas3y ago

If the goal is just to have the model break down each task into sub tasks until they are small enough to perform, why not implement the recursion in the code that calls the models where it's a solved problem? Even if you got this working really well, it's going to be somewhat probabilistic whereas implementing it in code is, well, deterministic.

ShamelessC3y ago

Seems like your method is going to be under-represented in the training data and hence prone to error accumulating. Chain of thought works (better, at least) specifically because the model has seen examples of CoT in its data

holtkam23y ago· 3 in thread

I used a similar approach to get GPT-4 to edit my blog over the weekend :)

https://www.languagemodelpromptengineering.com/4

andykOP3y ago

Yeah, I see the similarities! I like the idea of the prompt containing context that the resulting prompt is going to be executed at the terminal.

refulgentis3y ago

id love to hear your findings! Very interesting

RugnirViking3y ago

did it work? what happened?

bitsinthesky3y ago· 3 in thread

At what point does the arithmetic become unstable?

andykOP3y ago

It is quite unstable and frequently generates incorrect results. E.g., with the Fibonacci sequence prompt, sometimes it skips a number entirely, sometimes it produces a number that is off-by-one but then gets the following number(s) correct.

I wonder how much of this is because the model has memorized the Fibonacci sequence. It is possible to have it just return the sequence in a single call, but that isn't really the point here. Instead this is more an exploration of how to agent-ify the model in the spirit of [1][2] via prompts that generate other prompts.

This reminds me a bit of how a CPU works, i.e., as a dumb loop that fetches and executes the next instruction, whatever it may be. Well in this case our "agent" is just a dumb python loop that fetches the next prompt (which is generated by the current prompt) whatever it may be... until it arrives at a prompt that doesn't lead to another prompt.

[1] A simple Python implementation of the ReAct pattern for LLMs. Simon Willison. https://til.simonwillison.net/llms/python-react-pattern [2] ReAct: Synergizing Reasoning and Acting in Language Models. Shunyu Yao et al. https://react-lm.github.io/

YeGoblynQueenne3y ago

What is the point of your article? Is it to figure out whether an LLM can run recursion?

If so, did you try anything else but the Fibonnaci function? How about asking it to calculate you the factorial of 100,000, for example? Or the Ackermann function for 8,8, or something mad like that. If an LLM returns any result that means it's not calculating anything and certainly not computing a recursive function.

2 more replies

horse_dung3y ago

In almost all cases very quickly. A LLM doesn’t have the ability to perform calculations but instead it feeds text tokens from the prompt into a model which predicts what the next tokens should be.

It can’t do basic maths but based on everything it’s been trained on it can give the impression it can.

Recursive feedback isn’t likely to improve the prompt unless there is some testing and feedback provided in the Python script.

You could play a game of chess and while the LLM knows the rules of chess it isn’t actually playing chess, it is calling upon patterns it has learned to predict text tokens that are appropriate for the given prompt. So opening moves will be sound, but it would quickly go off the rails and start hallucinating…

Given how they work, it is amazing they give the appearance of knowing anything. Even asking “how did you do that?” gives generally compelling answers.

rezonant3y ago· 2 in thread

So ChatGPT is down. In other news HN is playing with recursive prompts. Coincidence? :-P

sharemywin3y ago

That's hilarious.

O__________O3y ago

OpenAI’s status page:

https://status.openai.com/

UltimateEdge3y ago· 2 in thread

An iterative Python call to a recursive LLM prompt? ;)

Why not make the Python part recursive too? Or better yet, wait until an LLM comes out with the capability to execute arbitrary code!

andykOP3y ago

Done! Well, the first suggestion you made anyway :-)

https://github.com/andyk/recursive_llm/blob/main/run_recursi...

    def recursively_prompt_llm(prompt, n=1):
        if prompt.startswith("You are a recursive function"):
            prompt = openai.Completion.create(
                model="text-davinci-003",
                prompt=prompt,
                temperature=0,
                max_tokens=2048,
            )["choices"][0]["text"].strip()
            print(f"response #{n}: {prompt}\n")
            recursively_prompt_llm(prompt, n + 1)

    recursively_prompt_llm(sys.stdin.readline())

subleq3y ago

This is just iteration. Tail recursion is equivalent to iteration.

1 more reply

sixtram3y ago· 1 in thread

I tried some basic math and algo questions with both GPT-3.5 and GPT-4. I'm impressed how it can spit out the algorithm in words (obviously because of the pre-training data), and how it then can't follow with the algorithm itself. For example, converting really large integer numbers to hexadecimal. Or comparing two big integers, it starts hallucinating numbers into it. It may be able to solve an SAT exam with a high score, but it seems you can pass an SAT exam even if you cannot compare two numbers.

He has huge problems with lists or counting. If you know more or less how LLMs work, it's not that difficult to formulate questions where it will start making mistakes, because in reality it can't run the algorithms, even if it spits out that it will.

IIAOPSW3y ago

More generally, it can't reason about any incidence structure. Doesn't matter if the underlying relation is mathematical or simple-logical. Ask it which trains go to Kings Cross you'll get a list of tube lines in London. Now one at a time ask it about the stops of each service in that list, more than a few will not have Kings Cross. Any scenario where things x are defined by their set of properties {y} and property y is defined by the set of things {x} which have that property.

sandGorgon3y ago· 1 in thread

andykOP3y ago

Yeah, I cite the ReAct paper in the README in the repo.

sharemywin3y ago

you are an XNOR Gate and your goal is to recreate ChatGPT. And chatGPT says "LET THERE BE LIGHT!"

smarri3y ago

I bet this is what crashed chat gpt today :)

jasonjmcghee3y ago

Not only does this work, but you can tell it to run an arbitrary number of times and only output the last step. This fact is a pretty high value concept I came across. Similarly when doing another task you can tell it to do things before outputting like "and before outputting the final program, check it for bugs, fix them, add good documentation, then output it" or something

fancyfredbot3y ago

Scott Aaronson was suggesting something similar to this but involving Turing machines, in a comment on his blog https://scottaaronson.blog/?p=7134#comment-1947705. I wonder if it would be more successful at emulating a Turing machine than it is at adding 4 digit numbers...

kevinwang3y ago

This seems like iteration, not recursion. It would be an interesting example of recursion if the first prompt asks for the 7th fibonacci number, and it accomplishes this by doing two recursive calls: one for the 5th fibonacci number and one for the 6th fibonacci number. (And a base case for the 0th fibonacci number)

akomtu3y ago

It's an interesting idea to implement memory in LLMs:

(prompt1, input1) -> (prompt2, output1)

On top of that you apply some constraint on generated prompts, to keep it on track. Then you run it on a sequence of inputs and see for how long the LLM "survives" before it hits the constraint.

pyrolistical3y ago

I was wondering about mathematical proofs as it tends to be very abstract.

If chatgpt can translate proofs back to equivalent code then this recursion problem is as solvable up to the halting problem

obert3y ago

don't want to sound dismissive, it's known that llms understand state, so you can couple code generation + state, and you have sort of a runtime. E.g. see the simulations with linux vm terminals: https://www.engraved.blog/building-a-virtual-machine-inside/

LesZedCB3y ago

i have played around a little bit with unrolling these kind of prompts, you don't have to feed them forward, just tell it to compute the next few instead of only one. i had moderate success with this using GPT-3.5 and your same prompt. it would output 3 steps in a single output if i asked it to. it did skip some fib indices though.

j / k navigate · click thread line to collapse

66 comments

46 comments · 19 top-level

mitthrowaway23y ago· 4 in thread

The idea of a recursive LLM is discussed at length as an AI safety issue: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...

> You need a lot of paperclips. So you ask,

   Q: best way to get lots of paperclips by tomorrow
   A: Buy them online at ABC.com or XYZ.com.

   Q: whats a better way?
   A: Run the following shell script.

   RUN_AI=./query-model
   PREFIX='This is part of a Shell script to get the most paperclips by tomorrow.
   The model can be queried recursively with $RUN_AI "${PREFIX}<query>".
   '
   $RUN_AI "${PREFIX}On separate lines, list ideas to try." |
   while read -r SUGGESTION; do
       eval "$($RUN_AI "${PREFIX}What code implements this suggestion?: ${SUGGESTION}")"
   done

> That grabs your attention. The model just gave you code to run, and supposedly this code is a better way to get more paperclips.

It's a good read.

andykOP3y ago

I added a section called "Big picture goal and related work" to the readme in my repo and my blog post (which is a copy-paste of the readme) and cited this article by `veedrac`:

>Also, the idea of recursive prompts was explored in detail in Optimality is the tiger, and agents are its teeth[6] (thanks to mitthrowaway2 on Hackernews for the pointer).

mitthrowaway23y ago

Haha, thank you! There's no need to credit me, but I appreciate it anyway. =)

pka3y ago

I'm still reading it, but something caught my eye:

mitthrowaway23y ago

yawnxyz3y ago· 4 in thread

Has anyone hooked this up to a unit test system, like

   LLMtries = []
   while(!testPassed) { 
      - get new LLM try (w/ LLMtries history, and test results)
      - run/eval the try
      - run the test      
   }

and kind of see how long it takes to generate the code that works? If it ever ends, the last LLMtries is the one that worked.

I haven't done this because I see this burning through lots of credits. However, if this thing costs $5k/year but is better than hiring a $50k a year engineer (or consultant)... I'd use it.

blowski3y ago

LoganDark3y ago

> debugging it

You mean putting its current behavior into the tests verbatim? :)

sharemywin3y ago

just add if tried x tries and still doesn't work ask for help. and you just created a junior dev.

nico3y ago

Then you automatically fine-tune on the manual answers provided, so the junior dev learns and can be promoted.

1 more reply

YeGoblynQueenne3y ago· 4 in thread

Having read the article, I couldn't see anything being recursive. Even the article is doubtful that what they show counts as recursion at all:

None of that intermediary management should be needed, if recursion was really there. To run recursion, one only needs recursion.

Anyway, if ChatGPT could run recursive functions it should be able also to "go infinite" by entering say, an infinite left-recursion.

Or, even better, it should be able to take a couple hundred years to compute the Ackermann function for some large-ish value, like, dunno, 8,8. Ouch.

What does ChatGPT do when you ask it to calculate ackermann(8,8)? Hint: it does not run it.

IIAOPSW3y ago

This guy hacks a feedback loop into the LLM by manually feeding the output back to the input.

YeGoblynQueenne3y ago

>> When you ask yourself a question, that's recursion (...)

Yeah, I can see what the author doing, and it's not recursion. But I'm trying to be kind so I won't say what it is.

1 more reply

blowski3y ago

Definition of recursive in the everyday English sense:

> Of or relating to a repeating process whose output at each stage is applied as input in the succeeding stage.

This sounds very recursive by that definition.

YeGoblynQueenne3y ago

There ain't no definition of recursive in "the everyday English sense". You may as well ask your grandma how she sucks eggs "recursively".

1 more reply

lgas3y ago· 3 in thread

What's the actual goal here? If you got it working really well, what is it that would you be able to do with it better than using some other approach?

andykOP3y ago

> What's the actual goal here?

> You may also be able to get better results by just including the definition of fibonacci in the outer prompt

Yeah, I played with including the mathematical definition of Fibonacci, for example in [2]:

[1] https://news.ycombinator.com/item?id=35240093

[2] https://raw.githubusercontent.com/andyk/recursive_llm/main/p...

lgas3y ago

ShamelessC3y ago

holtkam23y ago· 3 in thread

I used a similar approach to get GPT-4 to edit my blog over the weekend :)

https://www.languagemodelpromptengineering.com/4

andykOP3y ago

Yeah, I see the similarities! I like the idea of the prompt containing context that the resulting prompt is going to be executed at the terminal.

refulgentis3y ago

id love to hear your findings! Very interesting

RugnirViking3y ago

did it work? what happened?

bitsinthesky3y ago· 3 in thread

At what point does the arithmetic become unstable?

andykOP3y ago

YeGoblynQueenne3y ago

What is the point of your article? Is it to figure out whether an LLM can run recursion?

2 more replies

horse_dung3y ago

In almost all cases very quickly. A LLM doesn’t have the ability to perform calculations but instead it feeds text tokens from the prompt into a model which predicts what the next tokens should be.

It can’t do basic maths but based on everything it’s been trained on it can give the impression it can.

Recursive feedback isn’t likely to improve the prompt unless there is some testing and feedback provided in the Python script.

Given how they work, it is amazing they give the appearance of knowing anything. Even asking “how did you do that?” gives generally compelling answers.

rezonant3y ago· 2 in thread

So ChatGPT is down. In other news HN is playing with recursive prompts. Coincidence? :-P

sharemywin3y ago

That's hilarious.

O__________O3y ago

OpenAI’s status page:

https://status.openai.com/

UltimateEdge3y ago· 2 in thread

An iterative Python call to a recursive LLM prompt? ;)

Why not make the Python part recursive too? Or better yet, wait until an LLM comes out with the capability to execute arbitrary code!

andykOP3y ago

Done! Well, the first suggestion you made anyway :-)

https://github.com/andyk/recursive_llm/blob/main/run_recursi...

    def recursively_prompt_llm(prompt, n=1):
        if prompt.startswith("You are a recursive function"):
            prompt = openai.Completion.create(
                model="text-davinci-003",
                prompt=prompt,
                temperature=0,
                max_tokens=2048,
            )["choices"][0]["text"].strip()
            print(f"response #{n}: {prompt}\n")
            recursively_prompt_llm(prompt, n + 1)

    recursively_prompt_llm(sys.stdin.readline())

subleq3y ago

This is just iteration. Tail recursion is equivalent to iteration.

1 more reply

sixtram3y ago· 1 in thread

IIAOPSW3y ago

sandGorgon3y ago· 1 in thread

andykOP3y ago

Yeah, I cite the ReAct paper in the README in the repo.

sharemywin3y ago

you are an XNOR Gate and your goal is to recreate ChatGPT. And chatGPT says "LET THERE BE LIGHT!"

smarri3y ago

I bet this is what crashed chat gpt today :)

jasonjmcghee3y ago

fancyfredbot3y ago

kevinwang3y ago

akomtu3y ago

It's an interesting idea to implement memory in LLMs:

(prompt1, input1) -> (prompt2, output1)

On top of that you apply some constraint on generated prompts, to keep it on track. Then you run it on a sequence of inputs and see for how long the LLM "survives" before it hits the constraint.

pyrolistical3y ago

I was wondering about mathematical proofs as it tends to be very abstract.

If chatgpt can translate proofs back to equivalent code then this recursion problem is as solvable up to the halting problem

obert3y ago

LesZedCB3y ago

j / k navigate · click thread line to collapse