Posted by Jason Wei and Denny Zhou, Research Scientists, Google Research, Brain team
https://ai.googleblog.com/2022/05/language-models-perform-re...
At no point does the LLM know that 5+6 = 11, and if asked to solve a problem in which 5+6 was an implicit component of the solution but not explicitly present in the text, it would be completely lost.
Any human who has learned multiplication can do this. AFAIU, LLMs cannot unless the computation exists within the training set. They have zero arithmetic reasoning capability.
I just asked ChatGPT: "19191920 multipled by 10292111772"
It said:
--------------
To multiply 19191920 by 10292111772, you can use the standard long
multiplication method as follows:
19191920
x 10292111772
-------------
19191920000 (the product of 19191920 and 1)
153535360000 (the product of 19191920 and 8)
1535353600000 (the product of 19191920 and 2)
-------------
196837644266310240 (the final product)
Therefore, the result of multiplying 19191920 by 10292111772 is 196837644266310240.
----------------------------------This is completely wrong. It is not doing arithmetic, and it is not capable of doing arithmetic.
Furthermore GPT-4 is able to multiple slightly shorter pairs of numbers, and experiments on smaller neural nets, such as one using a dataset consisting of 20% of all combinations of two four digit numbers, quickly learn to generalize to successfully multiply any four digit pair even though they aren't in the training set.
Try again, give an answer that actually works this time.
"Reasoning" is a much broader term than you think it is.
----------------------------------------------------------
PROMPT: I live in a location with heavy clay soil, and plan to build an adobe home. Will it be well-insulated?
RESPONSE (edited and truncated): Building an adobe home in an area with heavy clay soil can be a good option for creating a well-insulated home [... ] The insulation properties of an adobe home will depend on several factors, including the thickness of the walls, the quality of the materials used, and the construction techniques used. Adobe walls tend to be thick, which can help to provide good insulation properties. The high thermal mass of adobe can also help to regulate temperature and humidity inside the house, keeping it cool in the summer and warm in the winter.
-------------------------------------------------------
So, here it is clear that GPT is just regurgitating information contained in its training set. Why am I so confident? It is an extremely widespread misconception, repeated almost everywhere, that adobe homes are "well insulated", and almost everywhere this is written about, you will the linkage between high thermal mass and insulation.
But this is actually completely incorrect. Adobe is a very poor insulator - a typical wall has an R factor of around 2. Ask someone who knows and you will find that "adobe is not an insulator at all" (e.g. https://www.greenhomebuilding.com/QandA/adobe/mass.htm)
So I went back to GPT to see if it "knew" about the poor R value:
--------------------
PROMPT: what is the R value of an adobe wall
RESPONSE: The R-value of an adobe wall depends on several factors, including the thickness of the wall, the density of the material, and the specific mix of materials used in the construction. Generally speaking, adobe walls have an R-value of around 0.2 to 0.5 per inch of thickness, which is relatively low compared to some other building materials.
For example, a 12-inch thick adobe wall might have an R-value of around 2.4 to 6.0, depending on the specific mix of materials used. This means that an adobe home will typically require additional insulation, such as fiberglass batts or blown-in cellulose, to achieve a higher level of thermal performance.
-------------------------------------------------
OK, so it does "know" this, and what is has generated here is factually correct, but at odds with its initial response. If it "reasoned" it would know that, given this R value, an adobe wall is pretty terrible for insulation, and would have focused on the details in this response in answering my first prompt.
Then it continues:
-----------------------------------
However, it's important to note that the insulation properties of adobe walls are not solely determined by the R-value [...]
-----------------------------------
and this final claim is completely incorrect. Insulation properties are 100% represented by R values, and anyone who actually knows this stuff would know this. It then goes on to repeat the stuff about thermal mass, which is important for how a house feels, but unrelated to its level of insulation and thus its heating requirements etc.
Now, I imagine that given all this, one could do some prompt "engineering" to get GPT to spit out something that reflects the answer that a human who actually knew and could reason about this stuff might. But I have zero doubt that what you'd actually be doing is adjusting the vocabulary to make it more likely it would base its response on e.g. the Green Building Advisor article above. I do not believe there are any prompts, or anything else, in GPT or any other LLM, that will cause it to "reason" ... hmm, let's check the R value for adobe, nope that's pretty horrible, the house will not be well insulated unless you ....
Try this prompt: "Taking into account the r-value of adobe, I live in a location with heavy clay soil, and plan to build an adobe home. Will it be well-insulated?"
These edge case "gotchas" are missing the point.
> At no point does the LLM know that 5+6 = 11
Does it need to "know" that (by your narrow definition of "know") in order to reason about a word math problem?
> if asked to solve a problem in which 5+6 was an implicit component of the solution but not explicitly present in the text, it would be completely lost
Can you provide an example? What makes you believe it can't be trained to solve those too? That's just a higher abstraction over the language. Add more layers, more training, etc. Many humans cannot solve basic math word puzzles that this artificial neural network can already solve.
If I ask you to multiply two (largeish) numbers together, you will be able to do so, using an algorithm/process that you can apply to the multiplication of any two numbers, whether anyone has ever told you about those numbers before or not.
LLM's cannot do this. Give them a math problem that doesn't exist in their training set and they cannot solve it. This has been demonstrated many times.
They routinely solve math problems (and other reasoning tasks) that don't exist in their training set. Examples were in that paper I linked to. This is one of the incredible emergent properties of LLMs / deep neural networks.
Try it out today on GPT-4. Make up your own math problems and go for it.