Scaffolded LLMs as natural language computers (opens in new tab)

(beren.io)

118 pointsveerd3y ago23 comments

23 comments

23 comments · 13 top-level

lsy3y ago· 7 in thread

I think the issue is that the sentence "Many tasks cannot be specified easily and precisely in computer code but can be described in a sentence or two of natural language" is not, in fact, true. Natural language is a pretty fantastically bad interface for specifying unambiguous, repeatable, and reliable tasks, which is why most technical advancement has involved the introduction of expressive notations that clarify and constrain problems in a way natural language can't. And the bulk of this article predicts (I think correctly!) that if LLMs are to be used for more automation, they have to progress towards some form of "semantic codes", "task primitives", "abstractions", ie. notation. "Prompt engineering" is a form of reaching for notation, although I would say at this point it is more like "prompt guessing".

However this prompts the question of why we are striving to create massive natural language models (with all the disadvantages of natural language) that we will then heavily constrain to perform tasks that can be performed by traditional computers with order-of-magnitude greater efficiency and reliability? Most of these "chaining" libraries are already engaged in asking the LLM to pretty-please output a standardized blob format that can be read by a Python harness, where we use the LLM to identify that something is e.g. an equation, and then pack it off to Wolfram or something. It seems like if you want to do this more than a few times, it's better to write a couple lines of code to do it much more cheaply.

vidarh3y ago

The big value is certainly in the less constrained stuff. Today I wanted to experiment with a hobby project, so I wrote a page of what I want to achieve. I then presented it to GPT4 as a "spec" and told it to act as a software architect and give me suggestions and ask for clarification. It gave me mostly good suggestions, asked a few good questions, and gave some feedback I disagreed with. I updated the spec to incorporate the good bits, clarified and firmed up the wording around the choices where it had made suggestions I disagreed with, and asked it to give another round of feedback. I turned around the draft 3 times, and it added value every time.

I could have done that with a person, but then I'd need someone who was available then and there that I wasn't taking away from other stuff.

To tie that to your points: This process was exactly an exercise in nailing down details that were missing because the short prose version was leaving huge gaps.

I intend to need a back and forth of filling in detail each step toward an implementation for that reason.

And I think this kind of tooling need to be built with that in mind: Write roughly what you need unless you already know how you want to express it in code. Ask for clarifications or a proposed plan. Iterate. Maybe with tooling giving examples of where it would head if prompted to fill in more detail.

I sure as hell won't trust it to just blindly do a task for me from a brief problem statement at this point.

Just as I wouldn't for most non-trivial tasks with most humans...

babyshake3y ago

Yes, the problem is "easily and precisely" should just be "easily". Natural language tends to be easy but not precise, while computer code is precise but not easy. Obviously something like idiomatic Python or other similar examples are relatively easy, but a very advanced SQL query can be expressed with a relatively concise syntax but unless you are an expert is not at all easy to produce on your own compared to expressing what you want in natural language.

manmal3y ago

I think developing often used functions as plugins/task primitives as native code (i.e. tools that the LLM can use) would be a logical next step - as you expressed, I think.

> It seems like if you want to do this more than a few times, it's better to write a couple lines of code to do it much more cheaply

I think the problem is that natively written code will always suck at the planning part. An LLM can use the tools available to it (eg a web browser or Wolfram) in infinite ways, and hopefully in a way that will advance the task at hand. Maybe the natively written task primitives can become really big (eg a fully automatic web scraper) - great, now the LLM can gather info for its plans even faster.

qup3y ago

> Natural language is a pretty fantastically bad interface for specifying unambiguous, repeatable, and reliable tasks

I agree. I think people are fooled by the implicit knowledge that most humans have. They make decisions about the details, where in programming you have to tend to each detail. That's roughly equivalent effort in natural language.

For some problems, probably harder in natural language.

I still think it's a great advancement. It opens programming to the masses, and the LLM can probably teach you the things you need to know to advance your competency.

furyofantares3y ago

It's pretty often that people tell a programmer what they want in not-that-much natural language and then the programmer makes it happen. You do have to verify the programmer's work and often provide some corrections that may have been the result of unrecognized ambiguity, but it still ends up being vastly less natural language sent to the programmer than the amount of precise language the programmer ultimately specifies.

catchnear43213y ago

After a few times, you could ask it to reduce its purpose to a couple of lines of code. Test the code, verify it. Maybe even deploy it.

That’s more expensive than writing a couple of lines of code, more than deploying it,

Sufficiently standardized, it will be significantly less expensive than paying someone to write a couple of lines of code, test it, deploy it, etc.

Is this better than just writing a couple of lines of code? That’s a different question. At scale, this can absolutely be cheaper. Eventually.

galaxyLogic3y ago

AI must learn to understand code, not just copy it

jackblemming3y ago· 1 in thread

>My prediction here is that exponential improvements continue at least for the new few years and likely beyond.

GPT-3 to GPT-4 was an exponential improvement? Progress is not usually exponential. Your phone now isn’t x10 better than it was a few years ago. Progress in AI is a huge jump, then refinement of that jump, then stagnation until the next big jump is discovered. Look at CNNs dominating image classification competitions out of nowhere, then they got refined, then they kind of stagnated and didn’t get much % accuracy better in whatever benchmark was used, in fact I think humans are still way better at many vision tasks and it’s been over a decade of research now since CNNs jumped on the scene as the hot thing. I don’t know why people refuse to understand or see this, but it’s tiring constantly hearing people pretend everything is exponential and AGI is two years away when AI hasn’t even beaten humans at some pretty trivial vision benchmarks.

richardfeynman3y ago

I agree that there are still differences between humans and AI, and vision is one of them. Humans also remember across conversations, ChatGPT doesn't. We have longer context windows. We think in our downtime, unprompted. We take input from 5 senses, today's chatGPT only takes input from text.

But--and this is a big but--the set of things that people can do better than computers has shrunk significantly over the past five months. Today, thanks to GPT4, AI can get a B in a Quantum Computing class, generate engaging stories, know that the color yellow is closer to orange than blue (despite never having seen color!), answer emotionally laden questions with the sort of facility that IMO is better than most humans, write code, rhyme, and much more. All of this stuff was unthinkable before. I personally thought it would be centuries until this stuff was possible. I was very wrong.

Several developments in Deep Learning, like the Transformers paper (https://arxiv.org/abs/1706.03762), set off this growth, as did big data and increased computing power. The insight of particular humans, like Ilya Sutskever, played a role as well. But taken together, I actually don't understand how one can argue that we aren't at the beginning of a massive exponential.

Of course there are things humans can still do better than AI, but the number of things is shrinking rapidly, while the number of things computers can do better than humans is growing rapidly.

I argue that we are indeed at the beginning of an exponential, and we'll see both new classes of products and faster development time.

killthebuddha3y ago· 1 in thread

People have already started writing the languages. Here's an example that I think is really neat: https://github.com/jbrukh/gpt-jargon.

michael-go3y ago

also https://lmql.ai is probably another early attempt at a language

tmaly3y ago· 1 in thread

I am interested to see how the concept longterm memory is developed with LLMs. It seems very slow to use fine tuning for this process.

On the topic of summarizing and determinism, I wonder if an intermediate bytecode loke language or structured subset of the English language could improve the outcome across models.

LesZedCB3y ago

I want to see huge context where fine-tuning could happen as something like a digest at the arrival of the end of the context window. so amortized cost is low.

manmal3y ago

As a software developer, this article has been able to give me a glimmer of hope that my skills won’t be fully obsolete once LLMs mature. The high level instructions (author calls them „programs“ even) will require highly structured thinking, translating business goals into actionable slices.

I’ve played with AutoGPT today, and, while the results were underwhelming (once it crashed, and once it got stuck in an infinite loop because it wrongly requested a website critical for the task) - it was an experience very similar to my first attempts at learning C. I tried to tell the system what I want it to do, and it mostly really followed my instructions. If (when?) all the components have improved in reliability and speed, this will become an insanely powerful way of working. A la „Make a website from this PDF with nextjs and deploy it to netlify“. Not very unlike „Read this file from disk and parse CSV rows from it“ as we are doing now, as devs working with high level programming languages.

zan24343y ago

This was an inspiring read! Reminds me of Simon Willison's analogy of LLMs to "calculators for words" but this author takes the idea even further. I agree the analogy points to foundation model companies like OpenAI and Anthropic having the most revenue but not the highest margins. Who will the Apple / Microsoft / Google of this new wave be? Who can take this raw technology and actually make it usable by all? "An LLM in every home"

jonplackett3y ago

This is a long but worthwhile read.

Opened my mind to what is to come and how it might happen

mcemilg3y ago

The most impressive aspect of ChatGPT for me is its ability to understand natural language. It's remarkable how it can comprehend corrupted text and discern what you're trying to convey. I believe that large language models will be utilized as natural language processors in the near future. However, unfortunately, alternatives like LLAMA, ALPACA, or Open Assistant are not yet on par with GPT-4. I don't think they're sufficient to be used as Natural Language Processing Units. We can't build a computer that relies on an API powered by a closed company.

daralthus3y ago

> Error correction itself is not new to hardware – huge amounts of research has been expended in creating error correcting codes to repair bit-flips. We will likely need similar ‘semantic’ error correcting codes for LLM outputs to be able to stitch together extended sequences of NLOPs in a highly coherent and consistent way.

Can we dive into the idea of "semantic error correcting codes" in this thread please?

sharemywin3y ago

I remember reading this book a long time ago and thought it had some pretty interesting concepts around agents:

https://aima.cs.berkeley.edu/

this was another one:

https://www.amazon.com/Artificial-Intelligence-3rd-Winston/d...

galaxyLogic3y ago

This is what I've been wondering what happens when you replace the human in the human-chatbot loop with another chatbot? I'm sure somebody must have tried it?

Those 2 chatbots could be of different models. Would they then teach each other something new?

highduc3y ago

What would be good approaches to also implement a personality layer, that can be more complex. Something like an effect box over the LLM that contains the info.

karmasimida3y ago

It has memory and it has built in interpreter, and it is even its own runtime

Pretty incredible

j / k navigate · click thread line to collapse

23 comments

23 comments · 13 top-level

lsy3y ago· 7 in thread

vidarh3y ago

I could have done that with a person, but then I'd need someone who was available then and there that I wasn't taking away from other stuff.

To tie that to your points: This process was exactly an exercise in nailing down details that were missing because the short prose version was leaving huge gaps.

I intend to need a back and forth of filling in detail each step toward an implementation for that reason.

I sure as hell won't trust it to just blindly do a task for me from a brief problem statement at this point.

Just as I wouldn't for most non-trivial tasks with most humans...

babyshake3y ago

manmal3y ago

I think developing often used functions as plugins/task primitives as native code (i.e. tools that the LLM can use) would be a logical next step - as you expressed, I think.

> It seems like if you want to do this more than a few times, it's better to write a couple lines of code to do it much more cheaply

qup3y ago

> Natural language is a pretty fantastically bad interface for specifying unambiguous, repeatable, and reliable tasks

For some problems, probably harder in natural language.

I still think it's a great advancement. It opens programming to the masses, and the LLM can probably teach you the things you need to know to advance your competency.

furyofantares3y ago

catchnear43213y ago

After a few times, you could ask it to reduce its purpose to a couple of lines of code. Test the code, verify it. Maybe even deploy it.

That’s more expensive than writing a couple of lines of code, more than deploying it,

Sufficiently standardized, it will be significantly less expensive than paying someone to write a couple of lines of code, test it, deploy it, etc.

Is this better than just writing a couple of lines of code? That’s a different question. At scale, this can absolutely be cheaper. Eventually.

galaxyLogic3y ago

AI must learn to understand code, not just copy it

jackblemming3y ago· 1 in thread

>My prediction here is that exponential improvements continue at least for the new few years and likely beyond.

richardfeynman3y ago

Of course there are things humans can still do better than AI, but the number of things is shrinking rapidly, while the number of things computers can do better than humans is growing rapidly.

I argue that we are indeed at the beginning of an exponential, and we'll see both new classes of products and faster development time.

killthebuddha3y ago· 1 in thread

People have already started writing the languages. Here's an example that I think is really neat: https://github.com/jbrukh/gpt-jargon.

michael-go3y ago

also https://lmql.ai is probably another early attempt at a language

tmaly3y ago· 1 in thread

I am interested to see how the concept longterm memory is developed with LLMs. It seems very slow to use fine tuning for this process.

On the topic of summarizing and determinism, I wonder if an intermediate bytecode loke language or structured subset of the English language could improve the outcome across models.

LesZedCB3y ago

I want to see huge context where fine-tuning could happen as something like a digest at the arrival of the end of the context window. so amortized cost is low.

manmal3y ago

zan24343y ago

jonplackett3y ago

This is a long but worthwhile read.

Opened my mind to what is to come and how it might happen

mcemilg3y ago

daralthus3y ago

Can we dive into the idea of "semantic error correcting codes" in this thread please?

sharemywin3y ago

I remember reading this book a long time ago and thought it had some pretty interesting concepts around agents:

https://aima.cs.berkeley.edu/

this was another one:

https://www.amazon.com/Artificial-Intelligence-3rd-Winston/d...

galaxyLogic3y ago

This is what I've been wondering what happens when you replace the human in the human-chatbot loop with another chatbot? I'm sure somebody must have tried it?

Those 2 chatbots could be of different models. Would they then teach each other something new?

highduc3y ago

What would be good approaches to also implement a personality layer, that can be more complex. Something like an effect box over the LLM that contains the info.

karmasimida3y ago

It has memory and it has built in interpreter, and it is even its own runtime

Pretty incredible

j / k navigate · click thread line to collapse