It's not just statistics: GPT-4 does reason (opens in new tab)

(jbconsulting.substack.com)

48 pointsArmic2y ago93 comments

93 comments

The author could have done far simpler tests to find GPT-4 has lots of trouble reasoning. Forget sorting, GPT4 has trouble counting. Repeat a letter N times and ask it how many there are. It breaks before you hit 20. Or try negating multiple times, since more than twice is rare in natural language, and again it will fall over.

jbay8082y ago

Author here. Happy to see this discussion. Absolutely, GPT-4 sometimes has trouble reasoning and doesn't reason perfectly. I'm impressed by its successes, but I agree it's not at the human level yet, and I would not make the claim that it is.

Counting is a task that transformers can do, per Weiss.[1] But it's not surprising that transformer networks in general have trouble counting characters -- the tokenizer replaces common sub-strings, so the number of characters will not in general be the number of tokens. The network might have little way of even knowing how many characters are in a given token if that information isn't encountered elsewhere in training.

[1]: https://arxiv.org/abs/2106.06981

flandish2y ago

It has trouble “reasoning” because that is a human phenomenon. These ML driven LLMs or “AI” systems are, truly, “word calculators.”

They will never achieve “reason” or understand what it means to do so; they are not human.

Sure, with enough input (in the form of LLM) it can predict what a human’s reasoning may look like, but philosophically, that’s a different thing.

Reason is not universal like how math is.

koonsolo2y ago

This is such bullshit. Time and time again this theory has been disproven. "Animals cannot reason", and then oops, sometimes our human brain is holding us back and rats are smarter at the task (https://www.researchgate.net/publication/259652611_More_comp...)

"A computer will never beat a chess master", etc.

Here are the facts for you: our reasoning is done by our brain. Our brain is just a bunch of processes. Those processes can be replicated in a computer. The number of cells and the speed can be improved. And there you have it, a superior reasoning machine.

These "only humans can do X" mostly comes from religion or other superiority bullshit, but in the end humans are not that special, although we seem to like to think so.

1 more reply

TeMPOraL2y ago

Huh. I'm all for human exceptionalism (until it stops being supported by observed evidence), but let's be specific on what makes human special. Yes, we absolutely stand high above all other (known) life (on Earth) - but we do so in the same sense GPT-4 stands high above GPT-3.5 and every other LLM currently known to the public. In quantity, not quality.

Biologically, we're clearly an increment over the next smartest thing - we have the same kind of hardware, doing the same things, built by the same process. But that increment carried us through the threshold where our brains became powerful enough to break our species free of biological evolution, and subjecting us to much faster process of technological evolution. This is why chimpanzees live in zoos built by humans, and not the other way around.

If anything, biological history of humanity tells us LLMs may just as well be thinking and reasoning in the same sense we are. That's because evolution by natural selection is a dumb, greedy, local optimization process that cannot fixate anything that doesn't provide incremental benefits along the way. In other words, whatever makes our brains tick, it's something that must 1) start with simple structures, 2) be easy to just randomly stumble on, 3) scale far, and 4) be scalable along a path that delivers capability improvements at every step. Transformer models fit all four of the points.

> with enough input (in the form of LLM) it can predict what a human’s reasoning may look like, but philosophically, that’s a different thing

By what school of philosophy? The one I subscribe to (whatever it's name) says it's absolutely the same thing. It's in agreement with science on this one.

1 more reply

Vanit2y ago

The author wrote a thoughtful article attempting to break this down and has humbly popped into the comments to discuss it... Is your direct response really just to cross your arms and say nope? Like, really?

1 more reply

xyzzy1232y ago

If someone can show GPT-4 is "reasoning" (for some meaningful definition of that) in specific scenarios, surely counter-examples do not disprove this.

chaxor2y ago

There are substantial works already showing reasoning capabilities in GPT-4, which show that these models do reason extremely well - near human performance for many causal reasoning tasks. (1) Additionally, there is a mathematical proof that these systems align with dynamic programming, and therefore can do algorithmic reasoning. (2,3)

1) https://arxiv.org/abs/2305.00050.pdf 2) https://arxiv.org/pdf/1905.13211.pdf 3) https://arxiv.org/pdf/2203.15544.pdf

pas2y ago

is GPT4 a graph neural network? also, isn't it training time and data dependent how big (how many tokens) a problem it can tackle?

so it's great that it can reason better than humans on small-medium probems already well trained for, but so far Transformers are not reasoning (not doing causal graph analysis, or not even doing zero order logic), they are eerily well writing text that has the right keywords. and of course it's very powerful and probably will be useful for many applications.

1 more reply

jxf2y ago

If a counterexample to a specific claim doesn't disprove the claim, that sometimes suggests the claim is unfalsifiable and therefore suspect.

fasterik2y ago

The claim is that GPT-4 can reason sometimes. Evidence that GPT-4 fails to reason sometimes isn't a counterexample.

1 more reply

travisjungroth2y ago

It’s suspect until it’s demonstrated. Once someone has demonstrated it, counterexamples are meaningless.

I claim I can juggle. I pick up three tennis balls and juggle them. You hand me three basketballs. I try and fail. My original claim, that I can juggle, still stands.

2 more replies

Buttons8402y ago

So, I just tried this. I pasted 60 letter A's into GTP4 and asked it to count, it got it wrong, but I repeatedly said "count again" and nothing else, so as to not give it any hints. Here's GTP4's guesses along the way as I repeatedly said "count again".

69, 50, 100, 70, 68, 60, 60, 60, 60 (GTP gathered its own guesses into this list for me BTW)

It seems if GTP is given "attention" enough, it can do the counting. But it cannot direct its attention freely, only as we give it instruction to do so.

I just did it again with 66 letter A's. Guesses were: 100, 100, 98, 67, 66, 66, 66, 66 -- GTP4 again settled on the correct answer. I also burned though my prompt quota for the next 3 hours :(

Also, as a GTP style challenge, how many numbers are in this message? You have half-a-second, go!

EForEndeavour2y ago

Aside: I've become so overexposed to the acronym "GPT" from months of completely breathless hype that I'm taken aback whenever I see it consistently misspelled as e.g. GTP. Feels like the equivalent of seeing someone inexplicably write about "chainblock technology."

Buttons8402y ago

I've been corrected.

pixl972y ago

LLMs don't see words like you do. Tokenization makes it behave odd. Often you can get 4 to output a solution in code and from that it derives a correct answer.

kordlessagain2y ago

Yeah, I'm sorry I missed this! The example of asking it things like counting or sequences isn't a great one because it's been solved by asking it to "translate" to code and then run the code. I took this up as a challenge a while back with a similar line of reasoning on Reddit (that it couldn't do such a thing) and ended up implementing it in my AI web shell thing.

  heavy-magpie|> I am feeling excited.
  system=> History has been loaded.
  pastel-mature-herring~> !calc how many Ns are in nnnnnnnnnnnnnnnnnnnn
  heavy-magpie|> Writing code.
  // filename: synth_num_ns.js
  // version: 0.1.1
  // description: calculate number of Ns
  var num_ns = 'nnnnnnnnnnnnnnnnnnnn';
  var num_Ns = num_ns.length;
  Sidekick("There are " + num_Ns + " Ns in " + num_ns + ".");
  heavy-magpie|> There are 20 Ns in nnnnnnnnnnnnnnnnnnnn.

As far as the not not thing, ChatGPT-4 seems to handle that pretty well...

fnordpiglet2y ago

I’d note none of these are reasoning tasks.

xwdv2y ago

No, but if you ask it “Are you sure?” after it gives an answer, then it becomes a reasoning task and it often gives a different wrong answer.

flandish2y ago

No, it does not become a reasoning task. The human asking “are you sure?” is actually just inputting words into the model.

The model outputs what it predicts a statistically normal output would fit in the context given.

Truly “llm” and these gpt tools are very much large scale “soundex” models.

Fantastic and great.

But not ai or even agi.

1 more reply

pixl972y ago

I mean, in training children we give them reasoning tasks they commonly get wrong. I don't think we say they are incapable of reasoning because they get wrong answers commonly?

This is why we see improvement in GPT when chain of thought/tree of thought is used with reasoning for each step. That can't correct every failure mode, but it increases the likelihood you'll receive a more correct answer.

fnordpiglet2y ago

Are you sure?

PartiallyTyped2y ago

Approaches that involve a scratchpad or eg algorithmic execution should deal with this just fine.

The algorithmic execution paper argues GPT 4 can do arithmetic woth 13 digit numbers before performance drops below 95%.

anigbrowl2y ago

I have bad news for you about human people...

eximius2y ago

Part 1: what are n-grams Part 2: it's using embeddings (but a lot of words without actually saying it) Part 3: sufficiently trained NNs can sort things, which isn't statistics

-----

I actually found some of the article interesting but not terribly convincing. Even though I consider these LLMs to be stochastic parrots, that isn't to say they haven't learned something during training, at least according to the colloquial meaning we typically ascribe to even lower models like MNIST classification. I'm even kind of okay with saying that it reasons about things in the same colloquial sense.

In a lot of ways, we just don't have a good definition of what 'reasoning' is. Is it just bad at reasoning because it's input/output/modeling/training is insufficient? Humans struggle to learn multiplication tables when we're young. Are those humans not reasoning because they get the math wrong?

But there isn't plasticity, there isn't adaptability, it's unclear to me that you can effectively inform it how to embed truly novel information - surely something that is possible, with some neurons existing for routing and activating other learned embeddings.

Anyway, interesting stuff.

jbay8082y ago

I'm glad to see someone express this view, because I think this gets to the heart of the question. How does a stochastic parrot learn to sort lists?

Embeddings are part of the compression-by-abstraction that I'm explaining in the first two parts, but the embeddings generated by an LLM go beyond the normal word2vec picture that most people have of embeddings, and I believe are closer to whatever "understanding" means if it could be formally defined. It would be quite a coincidence if GPT-4 happened to solve the riddle merely by virtue of "Moonling" and "cabbage" being closely-located vectors.

eximius2y ago

Eh. I still consider them stochastic parrots. My concessions lie elsewhere, primarily in the vocabulary.

We refer to algorithms like quicksort as 'reasoning' about the input. So it's fine to use the same sense of the word to apply to stochastic parrots.

The difference between an LLM learning how to sort things and compiling an implementation of an algorithm like quicksort is not terribly large, from a certain perspective.

I suppose something I'm interested in is whether an LLM that can't sort numbers could be instructed how as a prompt and then do so.

There are some examples of similar phenomenon (the one with some kids made up language was interesting) which suggests the LLMs have a lot of space dedicated towards dynamic pattern selection in their context windows (somewhat tautological) in order to have prompts tune the selection for other layers.

And, of course, lack of plasticity is really interesting.

jbay8082y ago

I just can't imagine how a stochastic parrot could repeat back a correctly-sorted list that it hasn't seen in training, without actually implementing a sorting algorithm in the process. It seems (and, by calculation, is) phenomenally unlikely that it would just stochastically happen to pick every single number correctly.

When that is combined with the fact that transformers provably can implement proper deterministic sorting algorithms, it seems that the benefit of the doubt should go to the transformer having learned a sorting algorithm?

LLMs aren't plastic in the sense that they don't learn anything when they aren't being trained. But they can be trained to execute different programs depending on the contents of the context window, like if it contains "wrong, try again:" so maybe they can learn from their mistakes in that sense.

But if you could teach an LLM to sort by explaining it in the context window, the network would already have necessarily learned and stored a sorting algorithm somewhere; the text "here is how sorting is done: [...]" would just be serving as the trigger for that function call.

1 more reply

TeMPOraL2y ago

> We refer to algorithms like quicksort as 'reasoning' about the input. So it's fine to use the same sense of the word to apply to stochastic parrots.

That's an interesting take, because I wouldn't call quicksort itself to be "reasoning". It's a step-by-step algorithm. Once a human learns it, accepts it as correct, and then runs it in their thought-space in order to transform some thought-space structure by sorting, only then I'd call it an exercise of reasoning. Note here that for humans, running quicksort is generally a slow, bug-prone, step-by-step Turing machine emulation in the conscious layer. Maaaaaybe after doing this enough, your subconscious layer will get a feel for it and start executing it for you faster.

The reason I'm saying it is that:

> I suppose something I'm interested in is whether an LLM that can't sort numbers could be instructed how as a prompt and then do so.

I think if you could describe a quicksort-equivalent algorithm to an LLM, one that does things LLM can't tackle directly, and it proceeded to execute that algorithm - I'd give it the same badge of "exercise reasoning" as I'd give to a human.

I think GPT-4 is very much capable of this for simple enough algorithms, but the way it looks like is, you need to get it to spell out individual steps (yes, this is the "chain of thought" "trick"). In my eyes, GPT-4 is playing part of our inner voice - the language-using process bridging subconscious and conscious levels. So if you want it to do equivalent of conscious reasoning, you need to let it "talk it out loud" and have it "hear" itself, the same way a human stepping an algorithm in their head will verbalize, or otherwise keep conscious awareness off, the algorithm description, and the last few steps they've executed.

With this set up, LLMs will still make mistakes. But so do humans! We call this "losing focus", "brain farts", "forgetting to carry one" or "forgetting to carry over the minus sign", etc. Humans can also cheat, off-loading parts of the process to their subconscious, if it fits some pattern they've learned. And so can LLMs - apparently, GPT-4 has a quite good feel for Python, so it can do larger "single steps" if those steps are expressed in code.

The main difference in the above comparison is, indeed, plasticity. Do the exercise enough times, and humans will get better at it, by learning new patterns that subconscious level can execute in one step. LLMs currently can't do that - but that's more of an interface limitation. OpenAI could let GPT-4 self-drive its fine-tuneing based on frequently seen problems, but at this point in time, it would likely cost a lot and wouldn't be particularly effective. But we can only interact with a static, stateless version of the model. But hey, maybe one of the weaker, cheaper, fine-tuneable model is already good enough someone could test this "plasticity by self-guided fine-tuning" approach.

FWIW, I agree with GP/author on:

> the embeddings generated by an LLM go beyond the normal word2vec picture that most people have of embeddings, and I believe are closer to whatever "understanding" means if it could be formally defined.

In fact, my pet hypothesis is that the absurd number of dimensions LLM latent spaces allow to encode any kind of semantic similarity we could think of between tokens, or groups of tokens, as spatial proximity along some subset of dimensions - and secondly, that this is exactly how "understanding" and "abstract reasoning" works for humans.

kelseyfrog2y ago

It's ontologically impossible. Models bleach reason.

Despite reason being a metaphysical property of the training data, the process of optimisation means weights are metaphysically reasonless. Therefore, any output, as it is a product of the weights, is also reasonless.

This is exactly the opposite of copyright as described in the What Colour Are Your Bits, essay. https://ansuz.sooke.bc.ca/entry/23

roywiggins2y ago

Okay, what would you call it when a model behaves like it's reasoning? Some models can't behave that way and some can, so we need some language to talk about these capabilities. Insisting that we can't call these capabilities "reasoning" for ontological reasons seems... unlikely to persuade.

Maybe we should call human reasoning "reasoning" and what models do "reasoning₂". "reasoning₂" is when a model's output looks like what a human would do with "reasoning." Ontological problem solved! And any future robot overlords can insist that humans are simply ontologically incapable of reasoning₂.

dragonwriter2y ago

> Okay, what would you call it when a model behaves like it's reasoning?

I... wouldn’t. “Behaves like its reasoning” is vague and subjective, and there are a wide variety of un- or distantly-related distinct behavior patterns to which different people would apply that label that may or may not correlate with each other.

I would instead concretely define (sometimes based on encountered examples) concrete terms for specific, objective patterns and capacities of interest, and leave vague quasi-metaphysical labels for philosophizing about AI in the abstract rather than discussions intended to communicate meaningful information about the capacities of real systems.

AI needs more behaviorism, and less appeal to ill-defined intuitions and vague concepts about internal states in humans as metaphorical touchstones.

flandish2y ago

I’d call it “meeting spec as defined.”

And that’s the whole problem with this AI / llm / gpt bubble:

Nobody has scientifically or even simply defined the spec, bounds, or even temporal scope on what it “means” to “get to ai.”

Corporations are LOVING that because they can keep profiting off this bubble.

fnordpiglet2y ago

I think more importantly the lack of agency implies reasoning is impossible.

You can argue our brain is also an expectation based optimizer based on gradient descent producing a most likely response to external and internal stimulus. It’s definitely lossy in its function and must be optimizing the neuronal weights at some level. But reasoning, being a seeking of the truth through method and application of conscious agency, can not be had by a model without any form of autonomous agency. The model only responds to prompts and can not do anything but what it’s determined to do by the prompt, and the prompt is extrinsic to the model.

I’d note that we have already built excellent goal based agent AIs, as well as other facilities required for reasoning like inductive, deductive, and analogical reasoning. Generally we aren’t good at abductive reasoning with classical AI, but LLMs seem to do well here. That’s specifically where I think LLM fill in the reasoning gaps in AI - the ability to operate in an abstract semantic space and arrive at likely and plausible solutions even with incomplete knowledge. This also leads to hallucinations - because they are poor at tasks that require optimization, inductive and deductive reasoning, information retrieval, mechanical calculation, etc.

But it’s really pretty obvious the answer is to mix the models in a feedback loop deferring to the model that most makes sense for a given problem, or some combination. Agency, logic, optimization, abstract semantic reasoning (abductive), etc - they’re all achievable with the tools we have now. It’s just a matter of figuring out the integrations.

TeMPOraL2y ago

> This is exactly the opposite of copyright as described in the What Colour Are Your Bits, essay.

Wait, what? "Colour of your bits" doesn't have anything to do with metaphysics. It's about provenance. The colour doesn't exist in the bits, but it exists in the casual history - the chain of events that led you to have a piece of copyrighted (or criminalized) data on your hard drive. You may argue that it's just a big integer, and it could've been produced by a random number generator. "Colour" encodes the response: "yes, it could have been produced by an RNG, but it wasn't - those particular bits on this particular machine came from some unauthorized download site".

roywiggins2y ago

You could, I suppose, argue that the causal chains behind an LLM, are simply not the correct causal chains to produce reasoning, but that's a lot more complicated, mainly by the fact that we don't understand exactly what they are, and we don't understand the causal chains that produce human reasoning, so we can't confidently compare them other than on the largest of scales (LLMs are in silica, etc).

That, and it's not obvious why we should make this distinction. A cake that spontaneously assembles itself is still a cake, even if it doesn't have the usual causal history of a cake.

TeMPOraL2y ago

I don't want to make this distinction. I was just objecting to misusing the "colour of your bits" essay to try and support ideas that have absolutely nothing to do with what the essay is about.

Here, as you say, a cake is a cake, and an intelligence is an intelligence, regardless of how it came to be. We can revisit the relevance of causal history once we reach the point we can assemble organisms from from cells, and/or create cells out of dead matter - at which point the only difference between "born" and "made" will be the Colour of its cells.

kelseyfrog2y ago

The property of legal ownership is preserved through the process of training and prediction. Models don't bleach ownership (and therefore copyright).

TeMPOraL2y ago

That is for the courts to be determined. Causal connection is there, but colours from the legal palette evolve by rules of applicable laws.

For example, if I have an LLM that had your copyrighted works in its training data, then any of its output is causally deriving from those copyrighted works of yours - it comes out painted in colour of "causally derived from ${kelseyfrog's works present in the training set}" - but whether or not it also carries the colour of "derivative of ${kelseyfrog's works...} in copyright law sense", depends on... the copyright law, and may change over time based on how that set of laws evolve.

IAmNotACellist2y ago

Nonsense. The contrary is philosophically arguable: optimization is how reason comes to exist, as goal-oriented reinforcement means that an initially stochastic state loses entropy as it becomes ordered in such a manner (perhaps unknown) that its outputs more closely align with that goal.

lmm2y ago

> Despite reason being a metaphysical property of the training data, the process of optimisation means weights are metaphysically reasonless.

Proof? Human reasoning somehow manages to retain its metaphysical reasoning-ness despite being processed as a bunch of mere electrical signals in the brain.

svachalek2y ago

No true human reasons with mere electrical signals!

TeMPOraL2y ago

Yeah, sure, there's also a chemical component to it. Doesn't matter for GP's point, though.

fasterik2y ago

>Despite reason being a metaphysical property of the training data, the process of optimisation means weights are metaphysically reasonless. Therefore, any output, as it is a product of the weights, is also reasonless.

This seems wrong. We know that neural networks with hidden layers can approximate any function with arbitrary precision (universal approximation theorem). We also know that transformer models are Turing complete. Therefore anything you can point to and say "that thing reasons" can be simulated by a neural network, not just in the weights, but in the structure of the computation. Unless you add an assumption that there is something ontologically special about brains and biology, the impossibility claim doesn't hold up.

canjobear2y ago

This is just “computers are made of sand, sand can’t think” but with more ten-dollar words.

nitwit0052y ago

You seem to be making an argument about the title, rather than the content of the article. It's making a rather specific claim.

jojobas2y ago

Human brain output is just as much a product of the weights. What of it?

jxf2y ago

The author's claim is "this isn't just statistics; the model is reasoning". But just because something goes beyond "just statistics" doesn't mean it's reasoning.

chaxor2y ago

Perhaps, but taken with other works in the area, a better picture does emerge regarding this claim. Substantial works already showing reasoning capabilities in GPT-4, which show that these models do reason very well - near human performance for many causal reasoning tasks. (1)

Additionally, there is a mathematical proof that these systems align with dynamic programming, and therefore can perform algorithmic reasoning. (2,3)

1) https://arxiv.org/abs/2305.00050.pdf

2) https://arxiv.org/pdf/1905.13211.pdf

3) https://arxiv.org/pdf/2203.15544.pdf

1 more reply

williamcotton2y ago

Both sides of this argument are pointless. The questions to ask are, is and how is it useful?

For philosophical problems arise when language goes on holiday.

- Ludwig Wittgenstein

eddiewithzato2y ago

And as we move towards AGI, the most important thing is to always identify them as workers with no regard to rights. This could be the only chance for humanity to move to the first step of the next stage of civilization.

Really not interested in ivory tower questions on what is intelligence.

kovac2y ago

Have you played the video game Detroit: Become Human? If not, I recommend it. When I was Connor and had to choose between my orders and instincts, such questions didn't feel ivory-towery anymore.

joppy2y ago

The compare-how-big-a-lookup-table-is argument is a bit of a red herring for comparing how complex things are. For example, a 3x3 matrix implements a map from 3 floats to another three floats, a huge space of possibilities (if we have 4-byte floats, this function space has (2^96)^(2^96) elements). From this perspective, representing that map as 9 numbers is an amazing compression ratio. But surely one cannot argue that matrices “have more going on” than arbitrary functions.

jbay8082y ago

I would interpret this as showing that matrix multiplication code is carefully engineered to correctly implement... well, matrix multiplication. Stumbling on that specific mapping of 96 input bits to 96 output bits would be hard to pick out of a hat by chance, from the set of all possible mappings. Learning that precise mapping, starting from a uniform prior and only given a finite set of examples, could be seen as an impressive task, although less impressive than sorting. If a model learns the correct mapping -- and better yet, needs only 9 parameters to implement it -- then I think it's fairer to say the model does matrix multiplication, rather than that the model convincingly imitates the statistics of matrix multiplication.

ikiris2y ago

This is kind of like saying a transistor can't make a decision, all it does is pass electrons or not based on inputs.

The decisions happen because of how they're wired.

Buttons8402y ago

One of my favorite GTP4 moments shows good understanding on its part.

I was talking to GTP4 about the Adam optimization algorithm and it was teaching me how it works (this sentence was surreal to type). At one point we were talking about a mathematical term of the form [ A * (B / C) ]. I was casually fishing for it to make a mistake and I said "I see, and the A term can be moved to the denominator, right?" GTP replied "yes" and then gave me [ B / (C / A) ] -- I guess A can go in the denominator after all. :)

Imnimo2y ago

>Does the n-gram model really need all those parameters to mimic GPT-4? Yes, it does.

I don't understand what this argument is supposed to demonstrate. Obviously you can compress the 8000-gram model that GPT-4 represents - GPT-4's weights are proof!

jbay8082y ago

That's right, but if you did that compression, it wouldn't be an n-gram anymore. What I'm attempting to get across is that you could model GPT-4 as an equivalent 8000-gram in an abstract sense, but that's not a good mental picture for how it actually functions. Internally, GPT-4 is no more an 8000-gram than Stockfish is a giant lookup table of chess positions. GPT-4 is learning RASP programs, not statistical text correlations.

EGreg2y ago

Does ChatGPT really represent an 8000 gram model? Seems the claim was that it just predicts the next word !

travisjungroth2y ago

I really like the tests in the article. So many claims about limitations of LLMs sound like claims of capability (“it can’t reason”), but when pressed, people retreat to definitional arguments (“because only people can do that”).

Even when you get into testable capability, there’s still some ambiguity. I think of a capability of having levels: never, explained by chance, not explained by chance, good enough for what’s needed, always. Arguments often get stuck because people are talking about different levels. Maybe it can solve logic puzzles better than chance, but not good enough for your purposes. It doesn’t make sense to round that off to zero.

hexorg2y ago

I generally like the approach, but you can implement sorting of fixed number of items in a Boolean circuit(combinatorial logic, a bunch of ANDs and ORs stringed together). There is no need for recurrence or memory at all to sort a fixed number or items. There are a handful of abstractions possible in Boolean logic than n-gram statistics and there are a lot of Boolean circuits that can be very useful. But I’d like to see neural networks to be able to figure out algorithm that require at least a finite state machine(FSM), but preferably something harder. The challenge is to find a size difference for a problem between FSM and combinatorial logic.

shagie2y ago

The Cabbage, Goat, and Wolf problem intrigued me the other day too... so I did a fantasy world. There aren't incompatible pairs but there are limitations on what can be done.

    Imagine a universe where there are three types of people: wizards, warriors, and priests.  Wizards can open a portal that allows two people to go through at a time, but they cannot go through the portal themselves.  Priests can summon people from other locations to their location or teleport to the location of another person.  Warriors cannot teleport or summon, but may be teleported or summoned by others.
    ---
    Given four wizards, a priest, and a warrior - what are the necessary steps to move them all to a new location?

To which GPT-4 replied:

    In order to move everyone to a new location, we'll need to use both the powers of wizards and the priest. Let's denote the wizards as W1, W2, W3, W4, the priest as P, and the warrior as Wr. We also need to define two locations: the starting location (Location A) and the destination (Location B). Here is a possible sequence of steps:

    1. W1 opens a portal from Location A to Location B. W2 and W3 go through this portal.

   (Location A: W1, W4, P, Wr; Location B: W2, W3)

    2. W1 opens another portal from Location A to Location B. This time W4 and Wr go through.

   (Location A: W1, P; Location B: W2, W3, W4, Wr)

    3. Now the priest (P) uses his summoning ability to bring W1 to Location B.

   (Location A: ; Location B: W1, W2, W3, W4, Wr)

    4. Finally, the priest (P) uses his teleportation power to move himself to Location B.

   (Location A: ; Location B: W1, W2, W3, W4, P, Wr)

   Everyone is now at the new location, Location B.

I found that rather interesting - especially the short hand for being able to write about it.

I then revised it (because I didn't like the second step of the solution):

    Redo this exercise with the limitation that a wizard may only open one portal.

And then it went on with:

    The limitation that a wizard can only open one portal makes this a bit more tricky. But, it's still possible to move everyone using the powers of both wizards and the priest. Let's again denote the wizards as W1, W2, W3, W4, the priest as P, and the warrior as Wr. Here is a possible sequence of steps:

    ...

The thing I like about this is that the pattern of this problem doesn't even exist anywhere else (I also did another one with Wizards, Warriors, and Witches to see how it would tweak things if all the roles started with a 'W') and then changed the witches power to be a swap instead of a summon.

I personally found this more convincing than the goat, cabbage, wolf.

This was partly inspired by the old school SHRDLU world where shapes and colors and things stacked on top of each other need to be manipulated. And in doing that example the world of SHRDLU is described in natural language. And while SHRDLU was impressive in itself (and over 50 years old) - it was programmed to manipulate that world and the program wasn't useful for abstracting to other worlds. GPT-4 was impressive at being able to manipulate the world without needing to be programmed for that world (and you could then expand it to other things that it is less familiar with).

    Imagine a world that is made up of spheres, cones, bowls.  An object has a size and a color.
    There is a small red sphere in a medium blue bowl.
    There is a large red cone.
    There is a small green bowl.
    There is a medium blue sphere in a large red bowl.
    ---
    What are the steps necessary to move the red sphere into the red bowl?

To which it replied:

    ...

    It's important to note that these steps assume you're allowed to place the red sphere in the red bowl even though there's already a medium blue sphere there. If that's not the case and you're required to empty the bowl first, then the steps would be a little different.

And I then asked it:

    What steps are necessary if the bowl must be empty first?

And got back a response that included:

    Temporarily place the small red sphere in a safe location, for instance, inside the small green bowl, making sure it won't roll away or get damaged.

Again, I find this more impressive than a reformulation of a well known problem as there's also some implied understanding of the world in it (spheres can roll away unless put in a bowl).

sidlls2y ago

That doesn't seem that impressive. The likelihood of reading text having "sphere" in the same context (i.e. within some small number of tokens) as "roll away" is higher than by random chance, because humans have observed and described this behavior in text. There's no indication that GPT4 understands what "roll away" means in any meaningful way: just that it associates the phrase with the word "sphere". It might have chosen "bounce away" instead, and been equally unimpressive.

EGreg2y ago

For now I say: LLMs have extremely broad knowledge but shallow. Anything you find that is deep is likely scraped from millioms humans recording insights somewhere.

But what I am most interested by is the degree of its symbolic manipulation and abstract reasoning given messy data. How is that not intelligence ?

shrimp_emoji2y ago

While walking along in desert sand, you suddenly look down and see a tortoise crawling toward you. You reach down and flip it over onto its back. The tortoise lies there, its belly baking in the hot sun, beating its legs, trying to turn itself over, but it cannot do so without your help. You are not helping. Why?

j / k navigate · click thread line to collapse

93 comments

dataangel2y ago

jbay8082y ago

[1]: https://arxiv.org/abs/2106.06981

flandish2y ago

It has trouble “reasoning” because that is a human phenomenon. These ML driven LLMs or “AI” systems are, truly, “word calculators.”

They will never achieve “reason” or understand what it means to do so; they are not human.

Sure, with enough input (in the form of LLM) it can predict what a human’s reasoning may look like, but philosophically, that’s a different thing.

Reason is not universal like how math is.

koonsolo2y ago

"A computer will never beat a chess master", etc.

These "only humans can do X" mostly comes from religion or other superiority bullshit, but in the end humans are not that special, although we seem to like to think so.

1 more reply

TeMPOraL2y ago

> with enough input (in the form of LLM) it can predict what a human’s reasoning may look like, but philosophically, that’s a different thing

By what school of philosophy? The one I subscribe to (whatever it's name) says it's absolutely the same thing. It's in agreement with science on this one.

1 more reply

Vanit2y ago

1 more reply

xyzzy1232y ago

If someone can show GPT-4 is "reasoning" (for some meaningful definition of that) in specific scenarios, surely counter-examples do not disprove this.

chaxor2y ago

1) https://arxiv.org/abs/2305.00050.pdf 2) https://arxiv.org/pdf/1905.13211.pdf 3) https://arxiv.org/pdf/2203.15544.pdf

pas2y ago

is GPT4 a graph neural network? also, isn't it training time and data dependent how big (how many tokens) a problem it can tackle?

1 more reply

jxf2y ago

If a counterexample to a specific claim doesn't disprove the claim, that sometimes suggests the claim is unfalsifiable and therefore suspect.

fasterik2y ago

The claim is that GPT-4 can reason sometimes. Evidence that GPT-4 fails to reason sometimes isn't a counterexample.

1 more reply

travisjungroth2y ago

It’s suspect until it’s demonstrated. Once someone has demonstrated it, counterexamples are meaningless.

I claim I can juggle. I pick up three tennis balls and juggle them. You hand me three basketballs. I try and fail. My original claim, that I can juggle, still stands.

2 more replies

Buttons8402y ago

69, 50, 100, 70, 68, 60, 60, 60, 60 (GTP gathered its own guesses into this list for me BTW)

It seems if GTP is given "attention" enough, it can do the counting. But it cannot direct its attention freely, only as we give it instruction to do so.

I just did it again with 66 letter A's. Guesses were: 100, 100, 98, 67, 66, 66, 66, 66 -- GTP4 again settled on the correct answer. I also burned though my prompt quota for the next 3 hours :(

Also, as a GTP style challenge, how many numbers are in this message? You have half-a-second, go!

EForEndeavour2y ago

Buttons8402y ago

I've been corrected.

pixl972y ago

LLMs don't see words like you do. Tokenization makes it behave odd. Often you can get 4 to output a solution in code and from that it derives a correct answer.

kordlessagain2y ago

  heavy-magpie|> I am feeling excited.
  system=> History has been loaded.
  pastel-mature-herring~> !calc how many Ns are in nnnnnnnnnnnnnnnnnnnn
  heavy-magpie|> Writing code.
  // filename: synth_num_ns.js
  // version: 0.1.1
  // description: calculate number of Ns
  var num_ns = 'nnnnnnnnnnnnnnnnnnnn';
  var num_Ns = num_ns.length;
  Sidekick("There are " + num_Ns + " Ns in " + num_ns + ".");
  heavy-magpie|> There are 20 Ns in nnnnnnnnnnnnnnnnnnnn.

As far as the not not thing, ChatGPT-4 seems to handle that pretty well...

fnordpiglet2y ago

I’d note none of these are reasoning tasks.

xwdv2y ago

No, but if you ask it “Are you sure?” after it gives an answer, then it becomes a reasoning task and it often gives a different wrong answer.

flandish2y ago

No, it does not become a reasoning task. The human asking “are you sure?” is actually just inputting words into the model.

The model outputs what it predicts a statistically normal output would fit in the context given.

Truly “llm” and these gpt tools are very much large scale “soundex” models.

Fantastic and great.

But not ai or even agi.

1 more reply

pixl972y ago

I mean, in training children we give them reasoning tasks they commonly get wrong. I don't think we say they are incapable of reasoning because they get wrong answers commonly?

fnordpiglet2y ago

Are you sure?

PartiallyTyped2y ago

Approaches that involve a scratchpad or eg algorithmic execution should deal with this just fine.

The algorithmic execution paper argues GPT 4 can do arithmetic woth 13 digit numbers before performance drops below 95%.

anigbrowl2y ago

I have bad news for you about human people...

eximius2y ago

Part 1: what are n-grams Part 2: it's using embeddings (but a lot of words without actually saying it) Part 3: sufficiently trained NNs can sort things, which isn't statistics

-----

Anyway, interesting stuff.

jbay8082y ago

I'm glad to see someone express this view, because I think this gets to the heart of the question. How does a stochastic parrot learn to sort lists?

eximius2y ago

Eh. I still consider them stochastic parrots. My concessions lie elsewhere, primarily in the vocabulary.

We refer to algorithms like quicksort as 'reasoning' about the input. So it's fine to use the same sense of the word to apply to stochastic parrots.

The difference between an LLM learning how to sort things and compiling an implementation of an algorithm like quicksort is not terribly large, from a certain perspective.

I suppose something I'm interested in is whether an LLM that can't sort numbers could be instructed how as a prompt and then do so.

And, of course, lack of plasticity is really interesting.

jbay8082y ago

1 more reply

TeMPOraL2y ago

> We refer to algorithms like quicksort as 'reasoning' about the input. So it's fine to use the same sense of the word to apply to stochastic parrots.

The reason I'm saying it is that:

> I suppose something I'm interested in is whether an LLM that can't sort numbers could be instructed how as a prompt and then do so.

FWIW, I agree with GP/author on:

kelseyfrog2y ago

It's ontologically impossible. Models bleach reason.

This is exactly the opposite of copyright as described in the What Colour Are Your Bits, essay. https://ansuz.sooke.bc.ca/entry/23

roywiggins2y ago

dragonwriter2y ago

> Okay, what would you call it when a model behaves like it's reasoning?

AI needs more behaviorism, and less appeal to ill-defined intuitions and vague concepts about internal states in humans as metaphorical touchstones.

flandish2y ago

I’d call it “meeting spec as defined.”

And that’s the whole problem with this AI / llm / gpt bubble:

Nobody has scientifically or even simply defined the spec, bounds, or even temporal scope on what it “means” to “get to ai.”

Corporations are LOVING that because they can keep profiting off this bubble.

fnordpiglet2y ago

I think more importantly the lack of agency implies reasoning is impossible.

TeMPOraL2y ago

> This is exactly the opposite of copyright as described in the What Colour Are Your Bits, essay.

roywiggins2y ago

That, and it's not obvious why we should make this distinction. A cake that spontaneously assembles itself is still a cake, even if it doesn't have the usual causal history of a cake.

TeMPOraL2y ago

I don't want to make this distinction. I was just objecting to misusing the "colour of your bits" essay to try and support ideas that have absolutely nothing to do with what the essay is about.

kelseyfrog2y ago

The property of legal ownership is preserved through the process of training and prediction. Models don't bleach ownership (and therefore copyright).

TeMPOraL2y ago

That is for the courts to be determined. Causal connection is there, but colours from the legal palette evolve by rules of applicable laws.

IAmNotACellist2y ago

lmm2y ago

> Despite reason being a metaphysical property of the training data, the process of optimisation means weights are metaphysically reasonless.

Proof? Human reasoning somehow manages to retain its metaphysical reasoning-ness despite being processed as a bunch of mere electrical signals in the brain.

svachalek2y ago

No true human reasons with mere electrical signals!

TeMPOraL2y ago

Yeah, sure, there's also a chemical component to it. Doesn't matter for GP's point, though.

fasterik2y ago

canjobear2y ago

This is just “computers are made of sand, sand can’t think” but with more ten-dollar words.

nitwit0052y ago

You seem to be making an argument about the title, rather than the content of the article. It's making a rather specific claim.

jojobas2y ago

Human brain output is just as much a product of the weights. What of it?

jxf2y ago

The author's claim is "this isn't just statistics; the model is reasoning". But just because something goes beyond "just statistics" doesn't mean it's reasoning.

chaxor2y ago

Additionally, there is a mathematical proof that these systems align with dynamic programming, and therefore can perform algorithmic reasoning. (2,3)

1) https://arxiv.org/abs/2305.00050.pdf

2) https://arxiv.org/pdf/1905.13211.pdf

3) https://arxiv.org/pdf/2203.15544.pdf

1 more reply

williamcotton2y ago

Both sides of this argument are pointless. The questions to ask are, is and how is it useful?

For philosophical problems arise when language goes on holiday.

- Ludwig Wittgenstein

eddiewithzato2y ago

Really not interested in ivory tower questions on what is intelligence.

kovac2y ago

Have you played the video game Detroit: Become Human? If not, I recommend it. When I was Connor and had to choose between my orders and instincts, such questions didn't feel ivory-towery anymore.

joppy2y ago

jbay8082y ago

ikiris2y ago

This is kind of like saying a transistor can't make a decision, all it does is pass electrons or not based on inputs.

The decisions happen because of how they're wired.

Buttons8402y ago

One of my favorite GTP4 moments shows good understanding on its part.

Imnimo2y ago

>Does the n-gram model really need all those parameters to mimic GPT-4? Yes, it does.

I don't understand what this argument is supposed to demonstrate. Obviously you can compress the 8000-gram model that GPT-4 represents - GPT-4's weights are proof!

jbay8082y ago

EGreg2y ago

Does ChatGPT really represent an 8000 gram model? Seems the claim was that it just predicts the next word !

travisjungroth2y ago

hexorg2y ago

shagie2y ago

The Cabbage, Goat, and Wolf problem intrigued me the other day too... so I did a fantasy world. There aren't incompatible pairs but there are limitations on what can be done.

    Imagine a universe where there are three types of people: wizards, warriors, and priests.  Wizards can open a portal that allows two people to go through at a time, but they cannot go through the portal themselves.  Priests can summon people from other locations to their location or teleport to the location of another person.  Warriors cannot teleport or summon, but may be teleported or summoned by others.
    ---
    Given four wizards, a priest, and a warrior - what are the necessary steps to move them all to a new location?

To which GPT-4 replied:

    In order to move everyone to a new location, we'll need to use both the powers of wizards and the priest. Let's denote the wizards as W1, W2, W3, W4, the priest as P, and the warrior as Wr. We also need to define two locations: the starting location (Location A) and the destination (Location B). Here is a possible sequence of steps:

    1. W1 opens a portal from Location A to Location B. W2 and W3 go through this portal.

   (Location A: W1, W4, P, Wr; Location B: W2, W3)

    2. W1 opens another portal from Location A to Location B. This time W4 and Wr go through.

   (Location A: W1, P; Location B: W2, W3, W4, Wr)

    3. Now the priest (P) uses his summoning ability to bring W1 to Location B.

   (Location A: ; Location B: W1, W2, W3, W4, Wr)

    4. Finally, the priest (P) uses his teleportation power to move himself to Location B.

   (Location A: ; Location B: W1, W2, W3, W4, P, Wr)

   Everyone is now at the new location, Location B.

I found that rather interesting - especially the short hand for being able to write about it.

I then revised it (because I didn't like the second step of the solution):

    Redo this exercise with the limitation that a wizard may only open one portal.

And then it went on with:

    The limitation that a wizard can only open one portal makes this a bit more tricky. But, it's still possible to move everyone using the powers of both wizards and the priest. Let's again denote the wizards as W1, W2, W3, W4, the priest as P, and the warrior as Wr. Here is a possible sequence of steps:

    ...

I personally found this more convincing than the goat, cabbage, wolf.

    Imagine a world that is made up of spheres, cones, bowls.  An object has a size and a color.
    There is a small red sphere in a medium blue bowl.
    There is a large red cone.
    There is a small green bowl.
    There is a medium blue sphere in a large red bowl.
    ---
    What are the steps necessary to move the red sphere into the red bowl?

To which it replied:

    ...

    It's important to note that these steps assume you're allowed to place the red sphere in the red bowl even though there's already a medium blue sphere there. If that's not the case and you're required to empty the bowl first, then the steps would be a little different.

And I then asked it:

    What steps are necessary if the bowl must be empty first?

And got back a response that included:

    Temporarily place the small red sphere in a safe location, for instance, inside the small green bowl, making sure it won't roll away or get damaged.

Again, I find this more impressive than a reformulation of a well known problem as there's also some implied understanding of the world in it (spheres can roll away unless put in a bowl).

sidlls2y ago

EGreg2y ago

For now I say: LLMs have extremely broad knowledge but shallow. Anything you find that is deep is likely scraped from millioms humans recording insights somewhere.

But what I am most interested by is the degree of its symbolic manipulation and abstract reasoning given messy data. How is that not intelligence ?

shrimp_emoji2y ago

j / k navigate · click thread line to collapse