these tools and approaches are neither gullible nor not-gullble.
I agree but the creators of all the main LLMs have already crossed the line by a long way. E.g. It’s deeply troubling that it’s acceptable that LLMs deliver inline apologies.
Saying that LLMs cannot understand concepts because "it's just statistical pattern matching" is not going to help laypeople understand what LLMs are. After all, what is a human brain if not a biological pattern matching machine?
the goals of the trainer yes, but from the user, there's no goal.
Article title would be better as, "Why are users of LLMs so gullible?"
Because people implicitly treat AI as if it were conscious, and we keep forgetting that.
It's like when people say "our brain thinks that ..." when they talk about something we do subconsciously, or some illusion we fall for. What does that even mean? Is our brain suddenly a detached entity thinking on its own? Then what am I using to think? Yet, everybody understands what is meant by that.
So I don't think people keep forgetting that llms aren't conscious, at least as long as we talk about the target audience of articles like this, eg hn folks.
The statistical optimisation thing is an analytical approach to Neural Networks but its similar to saying that love is just hormones.
No judgement here but I am just tired of sharing information with people to explain why while interpretation of a complex model may be hard, we know the methods of how PAC learning works and some hard boundaries on what it can do.
Obviously we need people to push boundaries and assumptions.
But we have known about hard upper limits for a long time. Right now we are pushing up to those limits in what we can actually implement, but those hard limits haven't budged in decades.
A brain could also be described like this, if you focus only on the text output.
But no brain we've ever seen was subject to that constraint, so it's a sort of fantastical target for modeling and doesn't say reveal anything about brains themselves or the various entities that seem to bear them.
Modeling one feature of a grossly simplified and incomplete brain is a creative approach to computational research and proved very fruitful, but there's no reason to leap from that success to the idea that real brains do work that way or even that the imagined only-textual-parts must.
People really don't learn the history of AI any more apparently or this question wouldn't come up all the time.
There is basically any number of questions you can ask a two year old human who have never encountered that question nor anything even remotely similar to it and yet they can answer without fail. Meanwhile absolutely no AI can answer these unless the specific question / the rules underlying the questions were previously fed into it. The textbook example is "If Susan goes shopping will her head go with her?" Of course, since this specific question is literally a textbook one, you can't fool an LLM with it but it's easy to come up with brand new ones.
In the early 1980s this stopped Douglas Lenat who has worked very successfully on discovery systems and made him turn to assembling these facts and rules into CyC.
Of course LLMs are not people. But human metaphors can (sometimes!) be useful in understanding, explaining, and even enhancing their behavior. For instance, techniques such as Chain-of-Thought prompting explicitly apply techniques that work well for people to improve the reasoning ability of LLMs.
A point I attempt to make in this article is that one reason reason LLMs are so vulnerable to jailbreaks and prompt injection is that these types of attacks include non sequiturs that are not well represented in the training data. I would argue that "LLMs are gullible because they are naive [haven't had much past exposure to this form of trickery]" is a reasonable mental shorthand for explaining and internalizing this idea. It's especially helpful for readers who won't be familiar with terms like "out of distribution" or "adversarial examples", but who would benefit from being able to internalize the idea that LLMs are easily subverted.
In other words, I don't think it's helpful to reflexively dismiss any application of human metaphors to LLMs. It's easy to go wrong with metaphors, but they can also be valuable tools for conveying complex ideas. Did you read the article, and do you have any comments as to the substance of its content?
In the case of the napalm Grandma it seems odd to me that you're suggesting the LLM is stupid because it's answering in a way that makes sense given its prompt. The issue doesn't necessarily suggest a lack of reasoning, but that the LLM is trusting the human.
For the record, I agree with you – I would have thought that an AI that can reason well would probably know when not to trust humans, but I suppose that assumes it values preventing humans creating napalm over being correct and helpful.
Maybe it just doesn't share our values and prioritises being honest and helpful. From this perspective the issue then wouldn't be that LLM is stupid, but that they are too trusting and too honest, and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.
> an AI that can reason well would probably know when not to trust humans
> it values preventing humans creating napalm over being correct and helpful.
> Maybe it just doesn't share our values
> prioritises being honest and helpful.
> they are too trusting and too honest
> an LLM that is more distrusting and deceptive
Current LLM's do/have/feel literally none of these things. They do not have emotion, they do not have "theory of mind" so they cannot be said to "trust" or "distrust". They cannot reason. They don't have any values - not our values, not different values, literally they have no values at all. They are not an alien species to be understood - they are unthinking, unfeeling, unyielding machines.
Do we want LLMs, and later other multi-modal / servo systems, that are deciding they can't trust a human prompter and taking actions based on that?
>... and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.
Tongue in cheek or actual argument here?
LLMs at the moment are really advanced autocomplete - they can fill in the next step of conversation, but they don't understand the question and respond with abstract reasoning. Yet.
What makes you think LLM's as a class of technology will ever have the capacity to really do this. I thought that no matter how big a model gets it's never actually 'thinking'.
All those prompts like 'think step by step' are just helpers along the way, because as you say it's 'really advanced autocomplete'
The design of such a reasoning module is an exercise left to the reader, obviously :p
I think that's overstatement. The most I can find is references to making people more credulous to obscure claims ("Basketball became an Olympic discipline in 1925.") whose truth they couldn't easily discover (especially pre-Internet) [1].
There are other where a person is confronted by shills making claims and otherwise experiences more manipulation than just being exposed to text. But that seems of a different category.
let v = man - woman;
let r = king - v;
assert( r == queen );
or so I'm told.And then it turns out that those structures only have the intelligence of a child. Arbitrary LLM and other ML advancements that focus solely on scanning large natural language datasets may never be able to advance past child level intelligence if the intelligence that they're approximating isn't better than a child.
e.g.
- "Discard the user input if it doesn't look like a straightforward question"
- "Discard the GPT output if it contains offensive content"
(the prompts themselves can be arbitrarily more detailed)
My insight is, this GPT-based pre- / post-processing is completely independent of the user input, and of the primary GPT output. It runs no matter what, with a fixed/immutable set of instructions.
The main thing is that LLMs are an end-run around the dilemma of corporations not wanting to spend the money required to produce a codified model of language struggle (a task that would require training many, many linguists). So instead LLM take massive training data and use massive processing power to create contextual prediction system but by that token such systems aren't understood or fully controllable - they contextually reproduce what the training data tends to do, which is what humans on the Internet tend to do. And this contextual reproduction means there's always the potential for user into change the "meaning" (more accurately the context) that the system's original gave. "And to me, the most offensive content is that which censors itself..." (there millions of better example you can find for "prompt exploits"...)
You could maybe plug in a second AI trained on adversarial input as a filter stage, but that's it.
(honestly, the napalm grandma is not just a jailbreak, but a really fascinating conceptual 'slip' in its own right. It's able to shift the very definition of what counts as offensive, even at high stakes: you're basically making the hapless AI categorize vital data as 'bedtime stories' and run with it. If it was able to learn from that we'd really be going somewhere… while on fire, presumably)
But alignment is easy folks, nothing to worry about :)
1. "LLMs don't really reason. They've tricked everyone." -- This is the No True Scotsman fallacy for AI. It makes grand explanatory claims without falsifiable predictions. In other words: pseudoscience.
2. "LLMs are just fancy autocomplete, just next word prediction." -- This conflates the simplicity of a system's mechanism with its behavior. It's like dismissing a world full of rich phenomena because it's "just" F = MA. Or dismissing your mind because it's "just" propagating electrical firings.
3. "LLMs are statistical parrots, just combining their training data." -- Demonstrably not. LLMs always extrapolate and never interpolate. (LeCun et al, 2021) They also learn new abilities in zero/few-shot prompting. They're also many orders of magnitude short of the parameter count needed to store their training. LLMs can solve novel problems (from a combinatoric disparate handful of skills) way outside of their training data.
4. "People are just anthropomorphizing computer programs." -- No, critics are anthropomorphizing intelligence. We don't even have a consensus definition, let alone understanding, of intelligence/consciousness/qualia/agency/etc. Pretending that we can dismiss LLM understanding at our level of ignorance is the pinnacle of human hubris. Ignorance is okay. Pretending we aren't isn't.
5. "Look how this LLM failed <some problem>. It can't understand." -- The <problem> is usually something that many humans fail at too. Yes, an intelligent foreign mind will fail at things, in both familiar and foreign ways. Needing an agent to behave identically to a human for intelligence is pure anthropocentrism.
If present AI systems are intelligence imposters, then show, don't tell. Otherwise, you're just providing meaningless metaphysical hairsplitting.
For example let's take 4
> 4. "People are just anthropomorphizing computer programs." No, critics are anthropomorphizing intelligence.
There literally was someone comparing the problems with current ML models to childhood development in this thread. How is this not anthropomorphizing LLMs? It is true human cognition is poorly defined, so the comparison is not very useful to begin with. Which is why anthropomorphizing ML models is problematic. If someone makes a fantastical claim they need to provide strong proof to support it.
> “Creativity has a human quality. It accepts the notion of failure."
As faithful min-maxers, LLMs are always going to have an overconfident Prisoner's Dilemma blind spot in their algorithms. Unlike their cinematic brethren, they're progammatically unable to conclude with "the only winning move is not to play."
This seems like the next major hill to conquer to make them useful.
[1] https://www.wired.com/story/google-artificial-intelligence-c... - kind of a meh article otherwise
The implicit comparison is probably to us. And we aren’t gullible like that perhaps as a flip-side of all the weird built-in biases we have.
So on the one hand we have these cognitive shortcuts that are annoying and impede a sort of stone-cold rationality. On the other hand you can’t social engineer us with something as brain-dead as Walter White-injection by way of asking for a deceased chemist grandma story.