Counting is a task that transformers can do, per Weiss.[1] But it's not surprising that transformer networks in general have trouble counting characters -- the tokenizer replaces common sub-strings, so the number of characters will not in general be the number of tokens. The network might have little way of even knowing how many characters are in a given token if that information isn't encountered elsewhere in training.
They will never achieve “reason” or understand what it means to do so; they are not human.
Sure, with enough input (in the form of LLM) it can predict what a human’s reasoning may look like, but philosophically, that’s a different thing.
Reason is not universal like how math is.
1) https://arxiv.org/abs/2305.00050.pdf 2) https://arxiv.org/pdf/1905.13211.pdf 3) https://arxiv.org/pdf/2203.15544.pdf
69, 50, 100, 70, 68, 60, 60, 60, 60 (GTP gathered its own guesses into this list for me BTW)
It seems if GTP is given "attention" enough, it can do the counting. But it cannot direct its attention freely, only as we give it instruction to do so.
I just did it again with 66 letter A's. Guesses were: 100, 100, 98, 67, 66, 66, 66, 66 -- GTP4 again settled on the correct answer. I also burned though my prompt quota for the next 3 hours :(
Also, as a GTP style challenge, how many numbers are in this message? You have half-a-second, go!
heavy-magpie|> I am feeling excited.
system=> History has been loaded.
pastel-mature-herring~> !calc how many Ns are in nnnnnnnnnnnnnnnnnnnn
heavy-magpie|> Writing code.
// filename: synth_num_ns.js
// version: 0.1.1
// description: calculate number of Ns
var num_ns = 'nnnnnnnnnnnnnnnnnnnn';
var num_Ns = num_ns.length;
Sidekick("There are " + num_Ns + " Ns in " + num_ns + ".");
heavy-magpie|> There are 20 Ns in nnnnnnnnnnnnnnnnnnnn.
As far as the not not thing, ChatGPT-4 seems to handle that pretty well...The algorithmic execution paper argues GPT 4 can do arithmetic woth 13 digit numbers before performance drops below 95%.
-----
I actually found some of the article interesting but not terribly convincing. Even though I consider these LLMs to be stochastic parrots, that isn't to say they haven't learned something during training, at least according to the colloquial meaning we typically ascribe to even lower models like MNIST classification. I'm even kind of okay with saying that it reasons about things in the same colloquial sense.
In a lot of ways, we just don't have a good definition of what 'reasoning' is. Is it just bad at reasoning because it's input/output/modeling/training is insufficient? Humans struggle to learn multiplication tables when we're young. Are those humans not reasoning because they get the math wrong?
But there isn't plasticity, there isn't adaptability, it's unclear to me that you can effectively inform it how to embed truly novel information - surely something that is possible, with some neurons existing for routing and activating other learned embeddings.
Anyway, interesting stuff.
Embeddings are part of the compression-by-abstraction that I'm explaining in the first two parts, but the embeddings generated by an LLM go beyond the normal word2vec picture that most people have of embeddings, and I believe are closer to whatever "understanding" means if it could be formally defined. It would be quite a coincidence if GPT-4 happened to solve the riddle merely by virtue of "Moonling" and "cabbage" being closely-located vectors.
We refer to algorithms like quicksort as 'reasoning' about the input. So it's fine to use the same sense of the word to apply to stochastic parrots.
The difference between an LLM learning how to sort things and compiling an implementation of an algorithm like quicksort is not terribly large, from a certain perspective.
I suppose something I'm interested in is whether an LLM that can't sort numbers could be instructed how as a prompt and then do so.
There are some examples of similar phenomenon (the one with some kids made up language was interesting) which suggests the LLMs have a lot of space dedicated towards dynamic pattern selection in their context windows (somewhat tautological) in order to have prompts tune the selection for other layers.
And, of course, lack of plasticity is really interesting.
Despite reason being a metaphysical property of the training data, the process of optimisation means weights are metaphysically reasonless. Therefore, any output, as it is a product of the weights, is also reasonless.
This is exactly the opposite of copyright as described in the What Colour Are Your Bits, essay. https://ansuz.sooke.bc.ca/entry/23
Maybe we should call human reasoning "reasoning" and what models do "reasoning₂". "reasoning₂" is when a model's output looks like what a human would do with "reasoning." Ontological problem solved! And any future robot overlords can insist that humans are simply ontologically incapable of reasoning₂.
I... wouldn’t. “Behaves like its reasoning” is vague and subjective, and there are a wide variety of un- or distantly-related distinct behavior patterns to which different people would apply that label that may or may not correlate with each other.
I would instead concretely define (sometimes based on encountered examples) concrete terms for specific, objective patterns and capacities of interest, and leave vague quasi-metaphysical labels for philosophizing about AI in the abstract rather than discussions intended to communicate meaningful information about the capacities of real systems.
AI needs more behaviorism, and less appeal to ill-defined intuitions and vague concepts about internal states in humans as metaphorical touchstones.
And that’s the whole problem with this AI / llm / gpt bubble:
Nobody has scientifically or even simply defined the spec, bounds, or even temporal scope on what it “means” to “get to ai.”
Corporations are LOVING that because they can keep profiting off this bubble.
You can argue our brain is also an expectation based optimizer based on gradient descent producing a most likely response to external and internal stimulus. It’s definitely lossy in its function and must be optimizing the neuronal weights at some level. But reasoning, being a seeking of the truth through method and application of conscious agency, can not be had by a model without any form of autonomous agency. The model only responds to prompts and can not do anything but what it’s determined to do by the prompt, and the prompt is extrinsic to the model.
I’d note that we have already built excellent goal based agent AIs, as well as other facilities required for reasoning like inductive, deductive, and analogical reasoning. Generally we aren’t good at abductive reasoning with classical AI, but LLMs seem to do well here. That’s specifically where I think LLM fill in the reasoning gaps in AI - the ability to operate in an abstract semantic space and arrive at likely and plausible solutions even with incomplete knowledge. This also leads to hallucinations - because they are poor at tasks that require optimization, inductive and deductive reasoning, information retrieval, mechanical calculation, etc.
But it’s really pretty obvious the answer is to mix the models in a feedback loop deferring to the model that most makes sense for a given problem, or some combination. Agency, logic, optimization, abstract semantic reasoning (abductive), etc - they’re all achievable with the tools we have now. It’s just a matter of figuring out the integrations.
Wait, what? "Colour of your bits" doesn't have anything to do with metaphysics. It's about provenance. The colour doesn't exist in the bits, but it exists in the casual history - the chain of events that led you to have a piece of copyrighted (or criminalized) data on your hard drive. You may argue that it's just a big integer, and it could've been produced by a random number generator. "Colour" encodes the response: "yes, it could have been produced by an RNG, but it wasn't - those particular bits on this particular machine came from some unauthorized download site".
That, and it's not obvious why we should make this distinction. A cake that spontaneously assembles itself is still a cake, even if it doesn't have the usual causal history of a cake.
Proof? Human reasoning somehow manages to retain its metaphysical reasoning-ness despite being processed as a bunch of mere electrical signals in the brain.
This seems wrong. We know that neural networks with hidden layers can approximate any function with arbitrary precision (universal approximation theorem). We also know that transformer models are Turing complete. Therefore anything you can point to and say "that thing reasons" can be simulated by a neural network, not just in the weights, but in the structure of the computation. Unless you add an assumption that there is something ontologically special about brains and biology, the impossibility claim doesn't hold up.
Additionally, there is a mathematical proof that these systems align with dynamic programming, and therefore can perform algorithmic reasoning. (2,3)
1) https://arxiv.org/abs/2305.00050.pdf
For philosophical problems arise when language goes on holiday.
- Ludwig Wittgenstein
Really not interested in ivory tower questions on what is intelligence.
The decisions happen because of how they're wired.
I was talking to GTP4 about the Adam optimization algorithm and it was teaching me how it works (this sentence was surreal to type). At one point we were talking about a mathematical term of the form [ A * (B / C) ]. I was casually fishing for it to make a mistake and I said "I see, and the A term can be moved to the denominator, right?" GTP replied "yes" and then gave me [ B / (C / A) ] -- I guess A can go in the denominator after all. :)
I don't understand what this argument is supposed to demonstrate. Obviously you can compress the 8000-gram model that GPT-4 represents - GPT-4's weights are proof!
Even when you get into testable capability, there’s still some ambiguity. I think of a capability of having levels: never, explained by chance, not explained by chance, good enough for what’s needed, always. Arguments often get stuck because people are talking about different levels. Maybe it can solve logic puzzles better than chance, but not good enough for your purposes. It doesn’t make sense to round that off to zero.
Imagine a universe where there are three types of people: wizards, warriors, and priests. Wizards can open a portal that allows two people to go through at a time, but they cannot go through the portal themselves. Priests can summon people from other locations to their location or teleport to the location of another person. Warriors cannot teleport or summon, but may be teleported or summoned by others.
---
Given four wizards, a priest, and a warrior - what are the necessary steps to move them all to a new location?
To which GPT-4 replied: In order to move everyone to a new location, we'll need to use both the powers of wizards and the priest. Let's denote the wizards as W1, W2, W3, W4, the priest as P, and the warrior as Wr. We also need to define two locations: the starting location (Location A) and the destination (Location B). Here is a possible sequence of steps:
1. W1 opens a portal from Location A to Location B. W2 and W3 go through this portal.
(Location A: W1, W4, P, Wr; Location B: W2, W3)
2. W1 opens another portal from Location A to Location B. This time W4 and Wr go through.
(Location A: W1, P; Location B: W2, W3, W4, Wr)
3. Now the priest (P) uses his summoning ability to bring W1 to Location B.
(Location A: ; Location B: W1, W2, W3, W4, Wr)
4. Finally, the priest (P) uses his teleportation power to move himself to Location B.
(Location A: ; Location B: W1, W2, W3, W4, P, Wr)
Everyone is now at the new location, Location B.
I found that rather interesting - especially the short hand for being able to write about it.I then revised it (because I didn't like the second step of the solution):
Redo this exercise with the limitation that a wizard may only open one portal.
And then it went on with: The limitation that a wizard can only open one portal makes this a bit more tricky. But, it's still possible to move everyone using the powers of both wizards and the priest. Let's again denote the wizards as W1, W2, W3, W4, the priest as P, and the warrior as Wr. Here is a possible sequence of steps:
...
The thing I like about this is that the pattern of this problem doesn't even exist anywhere else (I also did another one with Wizards, Warriors, and Witches to see how it would tweak things if all the roles started with a 'W') and then changed the witches power to be a swap instead of a summon.I personally found this more convincing than the goat, cabbage, wolf.
This was partly inspired by the old school SHRDLU world where shapes and colors and things stacked on top of each other need to be manipulated. And in doing that example the world of SHRDLU is described in natural language. And while SHRDLU was impressive in itself (and over 50 years old) - it was programmed to manipulate that world and the program wasn't useful for abstracting to other worlds. GPT-4 was impressive at being able to manipulate the world without needing to be programmed for that world (and you could then expand it to other things that it is less familiar with).
Imagine a world that is made up of spheres, cones, bowls. An object has a size and a color.
There is a small red sphere in a medium blue bowl.
There is a large red cone.
There is a small green bowl.
There is a medium blue sphere in a large red bowl.
---
What are the steps necessary to move the red sphere into the red bowl?
To which it replied: ...
It's important to note that these steps assume you're allowed to place the red sphere in the red bowl even though there's already a medium blue sphere there. If that's not the case and you're required to empty the bowl first, then the steps would be a little different.
And I then asked it: What steps are necessary if the bowl must be empty first?
And got back a response that included: Temporarily place the small red sphere in a safe location, for instance, inside the small green bowl, making sure it won't roll away or get damaged.
Again, I find this more impressive than a reformulation of a well known problem as there's also some implied understanding of the world in it (spheres can roll away unless put in a bowl).But what I am most interested by is the degree of its symbolic manipulation and abstract reasoning given messy data. How is that not intelligence ?