undefined | Better HN

0 pointsMike_123453y ago0 comments

"Language Models Perform Reasoning via Chain of Thought"

Posted by Jason Wei and Denny Zhou, Research Scientists, Google Research, Brain team

https://ai.googleblog.com/2022/05/language-models-perform-re...

0 comments

10 comments · 1 top-level

This paper is misusing the term "reasoning" in my opinion.

At no point does the LLM know that 5+6 = 11, and if asked to solve a problem in which 5+6 was an implicit component of the solution but not explicitly present in the text, it would be completely lost.

flangola73y ago

What is a reasoning task we could give an LLM that would demonstrate that it actually is not reasoning? It seems like that should be easy to construct as a very simple task outside its training set would fail utterly, but I have yet to witness one.

PaulDavisThe1st3y ago

1a. generate two numbers using: (random() % BIGNUM) 1b. ask the LLM to multiply them together

Any human who has learned multiplication can do this. AFAIU, LLMs cannot unless the computation exists within the training set. They have zero arithmetic reasoning capability.

I just asked ChatGPT: "19191920 multipled by 10292111772"

It said:

--------------

  To multiply 19191920 by 10292111772, you can use the standard long 
  multiplication method as follows:

     19191920
  x 10292111772
  -------------
    19191920000 (the product of 19191920 and 1)
    153535360000 (the product of 19191920 and 8)
    1535353600000 (the product of 19191920 and 2)
  -------------
  196837644266310240  (the final product)

  Therefore, the result of multiplying 19191920 by 10292111772 is 196837644266310240.

----------------------------------

This is completely wrong. It is not doing arithmetic, and it is not capable of doing arithmetic.

flangola73y ago

Many humans would not be able to solve that problem, especially those that are younger or have lower IQs, and obviously those that have not not been taught multiplication. That doesn't mean they are devoid of all reasoning ability.

Furthermore GPT-4 is able to multiple slightly shorter pairs of numbers, and experiments on smaller neural nets, such as one using a dataset consisting of 20% of all combinations of two four digit numbers, quickly learn to generalize to successfully multiply any four digit pair even though they aren't in the training set.

Try again, give an answer that actually works this time.

Mike_12345OP3y ago

All you have shown here is that its arithmetic reasoning is flawed, not that it cannot reason at all.

"Reasoning" is a much broader term than you think it is.

PaulDavisThe1st3y ago

For my next test, I used some knowledge I happen to have.

----------------------------------------------------------

PROMPT: I live in a location with heavy clay soil, and plan to build an adobe home. Will it be well-insulated?

RESPONSE (edited and truncated): Building an adobe home in an area with heavy clay soil can be a good option for creating a well-insulated home [... ] The insulation properties of an adobe home will depend on several factors, including the thickness of the walls, the quality of the materials used, and the construction techniques used. Adobe walls tend to be thick, which can help to provide good insulation properties. The high thermal mass of adobe can also help to regulate temperature and humidity inside the house, keeping it cool in the summer and warm in the winter.

-------------------------------------------------------

So, here it is clear that GPT is just regurgitating information contained in its training set. Why am I so confident? It is an extremely widespread misconception, repeated almost everywhere, that adobe homes are "well insulated", and almost everywhere this is written about, you will the linkage between high thermal mass and insulation.

But this is actually completely incorrect. Adobe is a very poor insulator - a typical wall has an R factor of around 2. Ask someone who knows and you will find that "adobe is not an insulator at all" (e.g. https://www.greenhomebuilding.com/QandA/adobe/mass.htm)

So I went back to GPT to see if it "knew" about the poor R value:

--------------------

PROMPT: what is the R value of an adobe wall

RESPONSE: The R-value of an adobe wall depends on several factors, including the thickness of the wall, the density of the material, and the specific mix of materials used in the construction. Generally speaking, adobe walls have an R-value of around 0.2 to 0.5 per inch of thickness, which is relatively low compared to some other building materials.

For example, a 12-inch thick adobe wall might have an R-value of around 2.4 to 6.0, depending on the specific mix of materials used. This means that an adobe home will typically require additional insulation, such as fiberglass batts or blown-in cellulose, to achieve a higher level of thermal performance.

-------------------------------------------------

OK, so it does "know" this, and what is has generated here is factually correct, but at odds with its initial response. If it "reasoned" it would know that, given this R value, an adobe wall is pretty terrible for insulation, and would have focused on the details in this response in answering my first prompt.

Then it continues:

-----------------------------------

However, it's important to note that the insulation properties of adobe walls are not solely determined by the R-value [...]

-----------------------------------

and this final claim is completely incorrect. Insulation properties are 100% represented by R values, and anyone who actually knows this stuff would know this. It then goes on to repeat the stuff about thermal mass, which is important for how a house feels, but unrelated to its level of insulation and thus its heating requirements etc.

Now, I imagine that given all this, one could do some prompt "engineering" to get GPT to spit out something that reflects the answer that a human who actually knew and could reason about this stuff might. But I have zero doubt that what you'd actually be doing is adjusting the vocabulary to make it more likely it would base its response on e.g. the Green Building Advisor article above. I do not believe there are any prompts, or anything else, in GPT or any other LLM, that will cause it to "reason" ... hmm, let's check the R value for adobe, nope that's pretty horrible, the house will not be well insulated unless you ....

Mike_12345OP3y ago

Everyone knows it has limitations. You have to work within the limitations of the model. No one has claimed that GPT is AGI. Doesn't mean it's incapable of any degree of reasoning. Yes the prompt actually matters. It was trained a specific way to solve specific tasks, and can generalize to solve tasks it has not seen before.

Try this prompt: "Taking into account the r-value of adobe, I live in a location with heavy clay soil, and plan to build an adobe home. Will it be well-insulated?"

These edge case "gotchas" are missing the point.

1 more reply

Mike_12345OP3y ago

You have a narrow definition of reasoning. Formally and technically it is solving a symbolic reasoning task through a sequence of steps. Yes we know it's not conscious and not human reasoning.

> At no point does the LLM know that 5+6 = 11

Does it need to "know" that (by your narrow definition of "know") in order to reason about a word math problem?

> if asked to solve a problem in which 5+6 was an implicit component of the solution but not explicitly present in the text, it would be completely lost

Can you provide an example? What makes you believe it can't be trained to solve those too? That's just a higher abstraction over the language. Add more layers, more training, etc. Many humans cannot solve basic math word puzzles that this artificial neural network can already solve.

PaulDavisThe1st3y ago

They do not "solve" word puzzles. They output text that appears to be the best response to the prompt, based on their training data. If the puzzle is solvable by doing this, then they get the answer right. If the puzzle is not solvable doing that, they are unlikely to get the answer right.

If I ask you to multiply two (largeish) numbers together, you will be able to do so, using an algorithm/process that you can apply to the multiplication of any two numbers, whether anyone has ever told you about those numbers before or not.

LLM's cannot do this. Give them a math problem that doesn't exist in their training set and they cannot solve it. This has been demonstrated many times.

Mike_12345OP3y ago

> Give them a math problem that doesn't exist in their training set and they cannot solve it.

They routinely solve math problems (and other reasoning tasks) that don't exist in their training set. Examples were in that paper I linked to. This is one of the incredible emergent properties of LLMs / deep neural networks.

Try it out today on GPT-4. Make up your own math problems and go for it.

1 more reply

j / k navigate · click thread line to collapse

0 comments

10 comments · 1 top-level

PaulDavisThe1st3y ago· 9 in thread

This paper is misusing the term "reasoning" in my opinion.

flangola73y ago

PaulDavisThe1st3y ago

1a. generate two numbers using: (random() % BIGNUM) 1b. ask the LLM to multiply them together

Any human who has learned multiplication can do this. AFAIU, LLMs cannot unless the computation exists within the training set. They have zero arithmetic reasoning capability.

I just asked ChatGPT: "19191920 multipled by 10292111772"

It said:

--------------

  To multiply 19191920 by 10292111772, you can use the standard long 
  multiplication method as follows:

     19191920
  x 10292111772
  -------------
    19191920000 (the product of 19191920 and 1)
    153535360000 (the product of 19191920 and 8)
    1535353600000 (the product of 19191920 and 2)
  -------------
  196837644266310240  (the final product)

  Therefore, the result of multiplying 19191920 by 10292111772 is 196837644266310240.

----------------------------------

This is completely wrong. It is not doing arithmetic, and it is not capable of doing arithmetic.

flangola73y ago

Try again, give an answer that actually works this time.

Mike_12345OP3y ago

All you have shown here is that its arithmetic reasoning is flawed, not that it cannot reason at all.

"Reasoning" is a much broader term than you think it is.

PaulDavisThe1st3y ago

For my next test, I used some knowledge I happen to have.

----------------------------------------------------------

PROMPT: I live in a location with heavy clay soil, and plan to build an adobe home. Will it be well-insulated?

-------------------------------------------------------

So I went back to GPT to see if it "knew" about the poor R value:

--------------------

PROMPT: what is the R value of an adobe wall

-------------------------------------------------

Then it continues:

-----------------------------------

However, it's important to note that the insulation properties of adobe walls are not solely determined by the R-value [...]

-----------------------------------

Mike_12345OP3y ago

Try this prompt: "Taking into account the r-value of adobe, I live in a location with heavy clay soil, and plan to build an adobe home. Will it be well-insulated?"

These edge case "gotchas" are missing the point.

1 more reply

Mike_12345OP3y ago

You have a narrow definition of reasoning. Formally and technically it is solving a symbolic reasoning task through a sequence of steps. Yes we know it's not conscious and not human reasoning.

> At no point does the LLM know that 5+6 = 11

Does it need to "know" that (by your narrow definition of "know") in order to reason about a word math problem?

> if asked to solve a problem in which 5+6 was an implicit component of the solution but not explicitly present in the text, it would be completely lost

PaulDavisThe1st3y ago

LLM's cannot do this. Give them a math problem that doesn't exist in their training set and they cannot solve it. This has been demonstrated many times.

Mike_12345OP3y ago

> Give them a math problem that doesn't exist in their training set and they cannot solve it.

Try it out today on GPT-4. Make up your own math problems and go for it.

1 more reply

j / k navigate · click thread line to collapse