Why Are LLMs So Gullible? (opens in new tab)

(amistrongeryet.substack.com)

49 pointssnewman2y ago101 comments

101 comments

42 comments · 8 top-level

kingkongjaffa2y ago· 21 in thread

because the output isn't the result of cognitive reasoning, it's the result of a statistical optimization problem where the goal is maximum acceptance by the user.

these tools and approaches are neither gullible nor not-gullble.

CharlesW2y ago

Thank you for this. I think it's important that technical folks in particular not anthropomorphize LLMs, and help less technical people understand how they work and that they lack consciousness, emotions, and understanding.

Affric2y ago

You mean these highly anthropomorphised programs? It’s important that technical people don’t anthropomorphise them?

I agree but the creators of all the main LLMs have already crossed the line by a long way. E.g. It’s deeply troubling that it’s acceptable that LLMs deliver inline apologies.

2 more replies

yongjik2y ago

Consciousness and emotions, sure. Understanding, I'm not so sure.

Saying that LLMs cannot understand concepts because "it's just statistical pattern matching" is not going to help laypeople understand what LLMs are. After all, what is a human brain if not a biological pattern matching machine?

cyanydeez2y ago

except the LLM has no goal, apriori, when _you_ are using it.

the goals of the trainer yes, but from the user, there's no goal.

throwaway2402192y ago

Exactly correct.

Article title would be better as, "Why are users of LLMs so gullible?"

Because people implicitly treat AI as if it were conscious, and we keep forgetting that.

iforgotpassword2y ago

Because it still makes sense to reason about them that way, as weird as it may be. It's like the Chinese room problem, even though we know nobody is inside.

It's like when people say "our brain thinks that ..." when they talk about something we do subconsciously, or some illusion we fall for. What does that even mean? Is our brain suddenly a detached entity thinking on its own? Then what am I using to think? Yet, everybody understands what is meant by that.

So I don't think people keep forgetting that llms aren't conscious, at least as long as we talk about the target audience of articles like this, eg hn folks.

1 more reply

mrtksn2y ago

I used to think like that but I'm not so sure anymore.

The statistical optimisation thing is an analytical approach to Neural Networks but its similar to saying that love is just hormones.

nyrikki2y ago

Real question, do you really want to know where you may be mistaken, or will you hand wave away information in order to protect what you believe?

No judgement here but I am just tired of sharing information with people to explain why while interpretation of a complex model may be hard, we know the methods of how PAC learning works and some hard boundaries on what it can do.

Obviously we need people to push boundaries and assumptions.

But we have known about hard upper limits for a long time. Right now we are pushing up to those limits in what we can actually implement, but those hard limits haven't budged in decades.

1 more reply

Jensson2y ago

Isn't love just hormones? It isn't rational reasoning at least.

4 more replies

the_gipsy2y ago

> statistical optimization problem where the goal is maximum acceptance by the user

A brain could also be described like this, if you focus only on the text output.

swatcoder2y ago

A brain that only performed text output might be modeled like that, which is exactly what these represent.

But no brain we've ever seen was subject to that constraint, so it's a sort of fantastical target for modeling and doesn't say reveal anything about brains themselves or the various entities that seem to bear them.

Modeling one feature of a grossly simplified and incomplete brain is a creative approach to computational research and proved very fruitful, but there's no reason to leap from that success to the idea that real brains do work that way or even that the imagined only-textual-parts must.

api2y ago

In a brain the user to some extent is itself. LLMs do not have anything like this. They're once-through, static, and are not in any way embodied or self-referential (beyond context or what you feed back into them).

2 more replies

orbital-decay2y ago

Can you quantify the difference between cognitive reasoning and statistical optimization?

chx2y ago

Sure thing.

People really don't learn the history of AI any more apparently or this question wouldn't come up all the time.

There is basically any number of questions you can ask a two year old human who have never encountered that question nor anything even remotely similar to it and yet they can answer without fail. Meanwhile absolutely no AI can answer these unless the specific question / the rules underlying the questions were previously fed into it. The textbook example is "If Susan goes shopping will her head go with her?" Of course, since this specific question is literally a textbook one, you can't fool an LLM with it but it's easy to come up with brand new ones.

In the early 1980s this stopped Douglas Lenat who has worked very successfully on discovery systems and made him turn to assembling these facts and rules into CyC.

1 more reply

mg742y ago

Statistical optimization is a known process; one where we understand every step, and can therefore instruct machines on how to perform it. Cognitive reasoning is still today not understood (in the Von Neumann sense) by anyone.

2 more replies

snewmanOP2y ago

(author here)

Of course LLMs are not people. But human metaphors can (sometimes!) be useful in understanding, explaining, and even enhancing their behavior. For instance, techniques such as Chain-of-Thought prompting explicitly apply techniques that work well for people to improve the reasoning ability of LLMs.

A point I attempt to make in this article is that one reason reason LLMs are so vulnerable to jailbreaks and prompt injection is that these types of attacks include non sequiturs that are not well represented in the training data. I would argue that "LLMs are gullible because they are naive [haven't had much past exposure to this form of trickery]" is a reasonable mental shorthand for explaining and internalizing this idea. It's especially helpful for readers who won't be familiar with terms like "out of distribution" or "adversarial examples", but who would benefit from being able to internalize the idea that LLMs are easily subverted.

In other words, I don't think it's helpful to reflexively dismiss any application of human metaphors to LLMs. It's easy to go wrong with metaphors, but they can also be valuable tools for conveying complex ideas. Did you read the article, and do you have any comments as to the substance of its content?

trott2y ago

Very good article. I wonder why it disappeared from HN? More comments than up-votes?

xtiansimon2y ago

Ugg. I read titles like this and think we’re talking about Tamagotchi feelings.

kypro2y ago

This depends on perspective. I could argue the issue isn't that it's gullible but misaligned.

In the case of the napalm Grandma it seems odd to me that you're suggesting the LLM is stupid because it's answering in a way that makes sense given its prompt. The issue doesn't necessarily suggest a lack of reasoning, but that the LLM is trusting the human.

For the record, I agree with you – I would have thought that an AI that can reason well would probably know when not to trust humans, but I suppose that assumes it values preventing humans creating napalm over being correct and helpful.

Maybe it just doesn't share our values and prioritises being honest and helpful. From this perspective the issue then wouldn't be that LLM is stupid, but that they are too trusting and too honest, and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.

reaperman2y ago

> the LLM is trusting the human

> an AI that can reason well would probably know when not to trust humans

> it values preventing humans creating napalm over being correct and helpful.

> Maybe it just doesn't share our values

> prioritises being honest and helpful.

> they are too trusting and too honest

> an LLM that is more distrusting and deceptive

Current LLM's do/have/feel literally none of these things. They do not have emotion, they do not have "theory of mind" so they cannot be said to "trust" or "distrust". They cannot reason. They don't have any values - not our values, not different values, literally they have no values at all. They are not an alien species to be understood - they are unthinking, unfeeling, unyielding machines.

1 more reply

mattlutze2y ago

> For the record, I agree with you – I would have thought that an AI that can reason well would probably know when not to trust humans, but I suppose that assumes it values preventing humans creating napalm over being correct and helpful.

Do we want LLMs, and later other multi-modal / servo systems, that are deciding they can't trust a human prompter and taking actions based on that?

>... and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.

Tongue in cheek or actual argument here?

1 more reply

sitharus2y ago· 4 in thread

With enough effort and priming you can trick _people_ in to believing things which are clearly untrue. Why do we expect LLMs, which are on a much earlier step of development, to be harder to trick than a child?

LLMs at the moment are really advanced autocomplete - they can fill in the next step of conversation, but they don't understand the question and respond with abstract reasoning. Yet.

kingkongjaffa2y ago

> they don't understand the question and respond with abstract reasoning. Yet.

What makes you think LLM's as a class of technology will ever have the capacity to really do this. I thought that no matter how big a model gets it's never actually 'thinking'.

All those prompts like 'think step by step' are just helpers along the way, because as you say it's 'really advanced autocomplete'

thfuran2y ago

It depends what exactly you mean by "LLM". But an ANN is effectively a function approximator. If you made one big enough to very closely approximate the entire quantum state of a person interacting with an environment, would you still declare that nothing it could do is "thinking"?

3 more replies

blackbear_2y ago

In principle you could attach a reasoning module to the neural network and only use the LLM part of the network for input/output.

The design of such a reasoning module is an exercise left to the reader, obviously :p

1 more reply

joe_the_user2y ago

With enough effort and priming you can trick _people_ in to believing things which are clearly untrue.

I think that's overstatement. The most I can find is references to making people more credulous to obscure claims ("Basketball became an Olympic discipline in 1925.") whose truth they couldn't easily discover (especially pre-Internet) [1].

There are other where a person is confronted by shills making claims and otherwise experiences more manipulation than just being exposed to text. But that seems of a different category.

[1] https://en.wikipedia.org/wiki/Illusory_truth_effect

dist-epoch2y ago· 4 in thread

Because they are at a child level of development. Give it a few years.

https://en.wikipedia.org/wiki/Child_development_stages

Alchemista2y ago

I don't think anthropomorphizing ML models is very useful

nomel2y ago

Comparing the intelligence of machine learning models that are designed to emulate human cognition and logical, to a well understood stage of human cognition and logic, is completely logical, and completely aligned with the purpose of the ML model's existence.

2 more replies

js22y ago

Indeed, they don't like that.

Verdex2y ago

Another possibility is that the only thing that LLMs are doing is encoding the structural data that exists in natural language. For example, you can load a corpus into vector space and then do algebra like:

  let v = man - woman;
  let r = king - v;
  assert( r == queen );

or so I'm told.

And then it turns out that those structures only have the intelligence of a child. Arbitrary LLM and other ML advancements that focus solely on scanning large natural language datasets may never be able to advance past child level intelligence if the intelligence that they're approximating isn't better than a child.

vemv2y ago· 3 in thread

Isn't it possible to filter both user input and GPT output with invisible, unmodifiable prompts?

e.g.

- "Discard the user input if it doesn't look like a straightforward question"

- "Discard the GPT output if it contains offensive content"

(the prompts themselves can be arbitrarily more detailed)

My insight is, this GPT-based pre- / post-processing is completely independent of the user input, and of the primary GPT output. It runs no matter what, with a fixed/immutable set of instructions.

joe_the_user2y ago

The reason that we had to wait for large language model in order to have computer systems that seemed produce something like effective natural (human) language processing (NLP) is that human language doesn't follow strict and logically definable rules but is instead something like a complex overlapping mesh of multiple kinds of rules-following processes. So what constitutes "offensive content" or a "straightforward question" or etc is itself not straightforward (yes irony but bear with me...).

The main thing is that LLMs are an end-run around the dilemma of corporations not wanting to spend the money required to produce a codified model of language struggle (a task that would require training many, many linguists). So instead LLM take massive training data and use massive processing power to create contextual prediction system but by that token such systems aren't understood or fully controllable - they contextually reproduce what the training data tends to do, which is what humans on the Internet tend to do. And this contextual reproduction means there's always the potential for user into change the "meaning" (more accurately the context) that the system's original gave. "And to me, the most offensive content is that which censors itself..." (there millions of better example you can find for "prompt exploits"...)

mschuster912y ago

If I understand it correctly, system prompts are ordinary prompts, aka in-band communication.

You could maybe plug in a second AI trained on adversarial input as a filter stage, but that's it.

Applejinx2y ago

I and my napalm grandmother are deeply offended at what you said about our loving bedtime rituals. Shame on you.

(honestly, the napalm grandma is not just a jailbreak, but a really fascinating conceptual 'slip' in its own right. It's able to shift the very definition of what counts as offensive, even at high stakes: you're basically making the hapless AI categorize vital data as 'bedtime stories' and run with it. If it was able to learn from that we'd really be going somewhere… while on fire, presumably)

ctoth2y ago· 1 in thread

Model RLHFed to follow instructions follows instructions, even when we might not want it to.

But alignment is easy folks, nothing to worry about :)

px432y ago

I think people might have forgotten that LLMs before InstructGPT came around could be weirdly opinionated jerks. There was this whole effort to train them so that we could actually give them instructions. It's probably a hell of a lot more useful to have an LLM that will just go with whatever weird stuff the human says rather than try to fight them on it.

https://openai.com/research/instruction-following

a_wild_dandan2y ago· 1 in thread

These comments are filled with confidently held, poorly justified assertions. Let's (again) challenge them:

1. "LLMs don't really reason. They've tricked everyone." -- This is the No True Scotsman fallacy for AI. It makes grand explanatory claims without falsifiable predictions. In other words: pseudoscience.

2. "LLMs are just fancy autocomplete, just next word prediction." -- This conflates the simplicity of a system's mechanism with its behavior. It's like dismissing a world full of rich phenomena because it's "just" F = MA. Or dismissing your mind because it's "just" propagating electrical firings.

3. "LLMs are statistical parrots, just combining their training data." -- Demonstrably not. LLMs always extrapolate and never interpolate. (LeCun et al, 2021) They also learn new abilities in zero/few-shot prompting. They're also many orders of magnitude short of the parameter count needed to store their training. LLMs can solve novel problems (from a combinatoric disparate handful of skills) way outside of their training data.

4. "People are just anthropomorphizing computer programs." -- No, critics are anthropomorphizing intelligence. We don't even have a consensus definition, let alone understanding, of intelligence/consciousness/qualia/agency/etc. Pretending that we can dismiss LLM understanding at our level of ignorance is the pinnacle of human hubris. Ignorance is okay. Pretending we aren't isn't.

5. "Look how this LLM failed <some problem>. It can't understand." -- The <problem> is usually something that many humans fail at too. Yes, an intelligent foreign mind will fail at things, in both familiar and foreign ways. Needing an agent to behave identically to a human for intelligence is pure anthropocentrism.

If present AI systems are intelligence imposters, then show, don't tell. Otherwise, you're just providing meaningless metaphysical hairsplitting.

Alchemista2y ago

Why not respond to the comments you feel are poorly constructed directly rather than posting what looks like a copy pasta. Some of the items in your list seem like strawmen, because I cannot even find these arguments in this thread as you state them in your list.

For example let's take 4

> 4. "People are just anthropomorphizing computer programs." No, critics are anthropomorphizing intelligence.

There literally was someone comparing the problems with current ML models to childhood development in this thread. How is this not anthropomorphizing LLMs? It is true human cognition is poorly defined, so the comparison is not very useful to begin with. Which is why anthropomorphizing ML models is problematic. If someone makes a fantastical claim they need to provide strong proof to support it.

kthejoker22y ago

Got this great quote from Garry Kasparov in Wired's article on multi-agent RL[1]:

> “Creativity has a human quality. It accepts the notion of failure."

As faithful min-maxers, LLMs are always going to have an overconfident Prisoner's Dilemma blind spot in their algorithms. Unlike their cinematic brethren, they're progammatically unable to conclude with "the only winning move is not to play."

This seems like the next major hill to conquer to make them useful.

[1] https://www.wired.com/story/google-artificial-intelligence-c... - kind of a meh article otherwise

keybored2y ago

Disclaimer: didn’t read

The implicit comparison is probably to us. And we aren’t gullible like that perhaps as a flip-side of all the weird built-in biases we have.

So on the one hand we have these cognitive shortcuts that are annoying and impede a sort of stone-cold rationality. On the other hand you can’t social engineer us with something as brain-dead as Walter White-injection by way of asking for a deceased chemist grandma story.

j / k navigate · click thread line to collapse

101 comments

42 comments · 8 top-level

kingkongjaffa2y ago· 21 in thread

because the output isn't the result of cognitive reasoning, it's the result of a statistical optimization problem where the goal is maximum acceptance by the user.

these tools and approaches are neither gullible nor not-gullble.

CharlesW2y ago

Affric2y ago

You mean these highly anthropomorphised programs? It’s important that technical people don’t anthropomorphise them?

I agree but the creators of all the main LLMs have already crossed the line by a long way. E.g. It’s deeply troubling that it’s acceptable that LLMs deliver inline apologies.

2 more replies

yongjik2y ago

Consciousness and emotions, sure. Understanding, I'm not so sure.

cyanydeez2y ago

except the LLM has no goal, apriori, when _you_ are using it.

the goals of the trainer yes, but from the user, there's no goal.

throwaway2402192y ago

Exactly correct.

Article title would be better as, "Why are users of LLMs so gullible?"

Because people implicitly treat AI as if it were conscious, and we keep forgetting that.

iforgotpassword2y ago

Because it still makes sense to reason about them that way, as weird as it may be. It's like the Chinese room problem, even though we know nobody is inside.

So I don't think people keep forgetting that llms aren't conscious, at least as long as we talk about the target audience of articles like this, eg hn folks.

1 more reply

mrtksn2y ago

I used to think like that but I'm not so sure anymore.

The statistical optimisation thing is an analytical approach to Neural Networks but its similar to saying that love is just hormones.

nyrikki2y ago

Real question, do you really want to know where you may be mistaken, or will you hand wave away information in order to protect what you believe?

Obviously we need people to push boundaries and assumptions.

But we have known about hard upper limits for a long time. Right now we are pushing up to those limits in what we can actually implement, but those hard limits haven't budged in decades.

1 more reply

Jensson2y ago

Isn't love just hormones? It isn't rational reasoning at least.

4 more replies

the_gipsy2y ago

> statistical optimization problem where the goal is maximum acceptance by the user

A brain could also be described like this, if you focus only on the text output.

swatcoder2y ago

A brain that only performed text output might be modeled like that, which is exactly what these represent.

api2y ago

2 more replies

orbital-decay2y ago

Can you quantify the difference between cognitive reasoning and statistical optimization?

chx2y ago

Sure thing.

People really don't learn the history of AI any more apparently or this question wouldn't come up all the time.

In the early 1980s this stopped Douglas Lenat who has worked very successfully on discovery systems and made him turn to assembling these facts and rules into CyC.

1 more reply

mg742y ago

2 more replies

snewmanOP2y ago

(author here)

trott2y ago

Very good article. I wonder why it disappeared from HN? More comments than up-votes?

xtiansimon2y ago

Ugg. I read titles like this and think we’re talking about Tamagotchi feelings.

kypro2y ago

This depends on perspective. I could argue the issue isn't that it's gullible but misaligned.

reaperman2y ago

> the LLM is trusting the human

> an AI that can reason well would probably know when not to trust humans

> it values preventing humans creating napalm over being correct and helpful.

> Maybe it just doesn't share our values

> prioritises being honest and helpful.

> they are too trusting and too honest

> an LLM that is more distrusting and deceptive

1 more reply

mattlutze2y ago

Do we want LLMs, and later other multi-modal / servo systems, that are deciding they can't trust a human prompter and taking actions based on that?

>... and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.

Tongue in cheek or actual argument here?

1 more reply

sitharus2y ago· 4 in thread

LLMs at the moment are really advanced autocomplete - they can fill in the next step of conversation, but they don't understand the question and respond with abstract reasoning. Yet.

kingkongjaffa2y ago

> they don't understand the question and respond with abstract reasoning. Yet.

What makes you think LLM's as a class of technology will ever have the capacity to really do this. I thought that no matter how big a model gets it's never actually 'thinking'.

All those prompts like 'think step by step' are just helpers along the way, because as you say it's 'really advanced autocomplete'

thfuran2y ago

3 more replies

blackbear_2y ago

In principle you could attach a reasoning module to the neural network and only use the LLM part of the network for input/output.

The design of such a reasoning module is an exercise left to the reader, obviously :p

1 more reply

joe_the_user2y ago

With enough effort and priming you can trick _people_ in to believing things which are clearly untrue.

There are other where a person is confronted by shills making claims and otherwise experiences more manipulation than just being exposed to text. But that seems of a different category.

[1] https://en.wikipedia.org/wiki/Illusory_truth_effect

dist-epoch2y ago· 4 in thread

Because they are at a child level of development. Give it a few years.

https://en.wikipedia.org/wiki/Child_development_stages

Alchemista2y ago

I don't think anthropomorphizing ML models is very useful

nomel2y ago

2 more replies

js22y ago

Indeed, they don't like that.

Verdex2y ago

  let v = man - woman;
  let r = king - v;
  assert( r == queen );

or so I'm told.

vemv2y ago· 3 in thread

Isn't it possible to filter both user input and GPT output with invisible, unmodifiable prompts?

e.g.

- "Discard the user input if it doesn't look like a straightforward question"

- "Discard the GPT output if it contains offensive content"

(the prompts themselves can be arbitrarily more detailed)

My insight is, this GPT-based pre- / post-processing is completely independent of the user input, and of the primary GPT output. It runs no matter what, with a fixed/immutable set of instructions.

joe_the_user2y ago

mschuster912y ago

If I understand it correctly, system prompts are ordinary prompts, aka in-band communication.

You could maybe plug in a second AI trained on adversarial input as a filter stage, but that's it.

Applejinx2y ago

I and my napalm grandmother are deeply offended at what you said about our loving bedtime rituals. Shame on you.

ctoth2y ago· 1 in thread

Model RLHFed to follow instructions follows instructions, even when we might not want it to.

But alignment is easy folks, nothing to worry about :)

px432y ago

https://openai.com/research/instruction-following

a_wild_dandan2y ago· 1 in thread

These comments are filled with confidently held, poorly justified assertions. Let's (again) challenge them:

If present AI systems are intelligence imposters, then show, don't tell. Otherwise, you're just providing meaningless metaphysical hairsplitting.

Alchemista2y ago

For example let's take 4

> 4. "People are just anthropomorphizing computer programs." No, critics are anthropomorphizing intelligence.

kthejoker22y ago

Got this great quote from Garry Kasparov in Wired's article on multi-agent RL[1]:

> “Creativity has a human quality. It accepts the notion of failure."

This seems like the next major hill to conquer to make them useful.

[1] https://www.wired.com/story/google-artificial-intelligence-c... - kind of a meh article otherwise

keybored2y ago

Disclaimer: didn’t read

The implicit comparison is probably to us. And we aren’t gullible like that perhaps as a flip-side of all the weird built-in biases we have.

j / k navigate · click thread line to collapse