undefined | Better HN

0 pointsMike_123453y ago0 comments

> Give them a math problem that doesn't exist in their training set and they cannot solve it.

They routinely solve math problems (and other reasoning tasks) that don't exist in their training set. Examples were in that paper I linked to. This is one of the incredible emergent properties of LLMs / deep neural networks.

Try it out today on GPT-4. Make up your own math problems and go for it.

0 comments

4 comments · 1 top-level

PaulDavisThe1st3y ago· 3 in thread

PROMPT: what is 19192920 * 190101271

RESPONSE: The product of 19192920 and 190101271 is: 3653424693064240

Fail on first try.

Mike_12345OP3y ago

Yes shifting the goal posts and finding edge cases not well suited to LLMs, and also ignoring the chain of thought prompting.

It can solve math word puzzles that are not in its training set.

Yes you can find these edge cases. We know about these edge cases and that's just missing the point.

There are countless examples of emergent properties in these LLMs which by definition are solving tasks outside of its training set.

Its reasoning has been demonstrated on examples outside of its training set.

PaulDavisThe1st3y ago

First of all, multiplying two numbers together is not "shifting the goal posts", but an absolutely basic test of any system that is claimed to able to do mathematical reasoning. I know that LLM's are not well suited for this, and that's because they cannot do arithmetic (among other things).

So I tried a word puzzle that would also require simple multiplication:

------------------------------

PROMPT: i am going to cycle 1600 miles, with 234 miles on gravel roads. on paved roads i will ride at 1929288282 millimeters per second but on gravel I will ride at 0.00000000202 parsecs per second. How long will the journey take?

-------------------------------

Now, I have to commend GPT on its ability to understand how you solve a problem like this, though that's not really very surprising given the huge numbers of such problems that exist in written materials. It precisely broke the problem down in a way that I suppose you could call "reasoning", but I would call "copying the formula for solving puzzles like this".

And how did it do with the actual math?

----------------

0.00000000202 parsecs per second is equivalent to 7499.6103827 miles per hour (mph), which we can calculate by converting parsecs to miles (1 parsec = 3.26 light-years = 19,173,511,840,000 miles) and dividing by the number of seconds in an hour:

0.00000000202 pc/s × 19,173,511,840,000 mi/pc ÷ 3600 s/hr = 7499.6103827 mph

----------

Utterly and completely wrong. Same issue with the millimeters per second computation.

It is completely obvious why LLMs cannot do this. They cannot perform even basic arithmetic reasoning, and even more fundamentally, the ONLY capability they have is to create likely responses to prompts. For some things, this is extraordinarily (and scarily) powerful. But it is not reasoning.

1 more reply

nl3y ago

I don't agree that LLMS can't reason, but literally saying "Make up your own math problems and go for it", him doing that and it failing really isn't moving the goal posts.

LLMs are not good at math. But this is a subset of reasoning.

Chain-of-thought on logical inference tasks (using fake labels so we are outside the training set) shows they can do reasonably well at these.

Nevertheless, it's likely that the best approach for pure reasoning tasks will be to connect a LLM to a real inference engine (datalog or something) and rely on the LLM to perform the mapping to the inference engine inputs and outputs. This is similar to the "System 1" and "System 2" models of human thought.

j / k navigate · click thread line to collapse