undefined | Better HN

0 pointsderefr2y ago0 comments

GPT-4 is not a pure LLM. It also accepts image inputs. There's other stuff "going on in there" in a GPT model than just linguistic analysis — and those other "facilities" of the model can potentially serve the needs of doing math better than the language parts can.

But presuming that wasn't the critical point you wanted to make:

Like I said, a language model can know that "1" "is less than" "2" — and it can also know (if it's either trained with characters as lexemes, or is given access to a pre-parse output to second-chance analyze unknown tokens) that "10" is the same thing as (1 tens). Which then means that it can know that "23" "is less than" "48" because it can do linguistic deductive tricks between the terms (2 tens plus 3 ones) and (4 tens plus 8 ones).

But those tricks are tricks. It isn't doing math; it's applying "2" as an adjective to "tens", constructing a verb phrase whose verb is "plus", and then (likely) interpreting your question as a question about analogy. It knows that (2 pineapples) "is less than" (3 pineapples) by analogy — (N of some unit) "is analogous to" N-the-number. But it doesn't know that "tens" is a special unit distinct from "pineapples" in that it changes the meaning of the number-token it's attaching to.

To put it another way: a (pure) language model has no way of encoding numbers that allows it to actually do math and get correct results out. It can memorize tables of answers for well-known numbers, and it can try to use language tricks to combine those tables, but it can't perform an algorithm on a number, because no part of its architecture allows the nodes in its model to act as a register to encode an (arbitrarily large) number in such a way that it is actually amenable to numeric operations being performed on that data.

A model that is really modelling numbers, should be able to apply any arbitrary algorithm it knows about to those numbers, just like a regular CPU can apply any instruction sequence it reads to its registers. Not just add/sub, or mul/div, but arbitrarily-complex things like e.g. iterated modular exponentiation, should just be a matter of saying "hey LLM, you remember the algorithm for doing MOD-EXP, right? So tell me...."

(Note that humans can't do this kind of math purely "in our heads" any more than LLMs can, because we don't have any low-level accelerative infrastructure for modelling and working with numeric data either! We need an external buffer that inherently embeds sequencing/positioning info — like our auditory sensory "loop" memory from [sub]verbally repeating the working data; or our visual sensory persistence-of-vision memory, from writing the data down onto a piece of paper and staring at it as we work.)

0 comments

2 comments · 1 top-level

theptip2y ago· 1 in thread

> GPT-4 is not a pure LLM

I’ve looked a bit into the GPT architecture and haven’t seen anything suggesting it’s doing special-case experts for maths. It has MoE over 16 language models, and an image modality bolted on. If you have any evidence that there is a separate trained logic/math model I’d love to see that, as it would be interesting. (I don’t recall reading anything like that in the GPT papers for example, and this seems to claim there is no “calculator” hooked up in GPT-4 https://ai.stackexchange.com/a/40090).

> To put it another way: a (pure) language model has no way of encoding numbers

I think you just motte-and-bailey’d. Your original claim was that a LLM was incapable of doing $X > $Y or displaying numeracy, which I refuted by showing an example of an LLM doing greater than comparisons, and subtracting a quantity in different units ($50k -> 50,000).

Now you are substituting a much narrower claim, that an LLM is stucturally incapable of symbolic manipulation and “really modeling numbers”. This might be so! But it’s not required for basic numeracy, “tricks” as you put it, or whatever else GPT has learned, can objectively get us to median human performance.

Even going way back to GPT-2 there are mechanistic interpretability papers investigating how greater-than is implemented, eg https://arxiv.org/abs/2305.00586.

And there is work that suggests that LLMs do some sort of phase transition to gain numeracy skills: https://arxiv.org/pdf/2206.07682.pdf.

Your objection about working memory is also odd. Chain of thought reasoning strategies use the context as the working memory and have been demonstrated to improve performance on numeracy tasks.

But again, if you are retreating to a very narrow claim that the model can’t do precise calculations in a single inference step, then sure, that’s technically plausible, but that’s a way higher bar than displaying basic numeracy, and doesn’t justify the incredulity in your GP comment.

derefrOP2y ago

> haven’t seen anything suggesting it’s doing special-case experts for maths

I didn't say it is. I said it is at least trained on images, which means it has a visual processing layer. I then mentioned that in humans, the visual sensory memory used for persistence-of-vision — along with the higher-level abstract positional memory used for navigation and not tripping on tree roots — has been shown to be active when doing arithmetic; and that this is suggestive of the visual field being used to "outsource" positional/sequencing tracking for numbers.

My implicit hypothesis (that I didn't want to say explicitly, because I'm not an ML researcher and I have no idea how to even begin to determine the truth-value of this) is that the GPT architecture is able to be as numerate as it is, vs. other pure text-in-text-out language models, because it's reusing the generalized visual field it evolved to map images into tokens, as a within-inference-step working memory for holding absolute token positioning meta-information. (Or, to put that in human terms: it's visualizing the numbers.)

> But it’s not required for basic numeracy, “tricks” as you put it, or whatever else GPT has learned, can objectively get us to median human performance.

No — as the median human (with a pencil and paper) can do simple arithmetic on arbitrarily large numbers.

The difference between "memorizing a bunch of tables" and numeracy is that numeracy is a knowledge of algorithms, not a memorization of truth tables; it a set of skills that can be applied to never-before-seen mathematical objects to yield correct answers. You can ask a human to compare two 800-digit numbers, or add them together, and they'll be able to do it, one step at a time.

As far as I know, GPT does not have the "skill" of numeracy in the sense of being able to do even simple arithmetic on unbounded-length numbers. And I don't mean the boring thing (that it has a bounded context window, so the number has to fit in there); I mean that it fails at adding two numbers when you start to get up to even just e.g. 64-digit numbers. It starts doing things like (seemingly) breaking the numbers down into sub-sequences and independently adding them up, but then forgetting to carry between the sub-sequences, or even forgetting which order the aggregates of the sub-sequences should be put back together in.

It seems very apparent to me, after much experimentation, that GPT models are just trying to treat numbers as a finite set of objects (maybe 100K-or-so?), each with a set of baked-in properties and relationships — plus a set of logically unsound rules they've derived for breaking large numbers down into small numbers, and putting small numbers back together into large numbers. These models are, in other words, using language skills (memorization of properties; adjective grouping; analogy) to pretend to do math — to cargo cult a symbolic-manipulation process they don't understand, in the hopes of at least looking like they're doing it correctly — but that's not the same as actually applying the scalable process of arithmetic to an arbitrary number.

An adult who "did math" this way, would be described as "someone who never learned to do math." And they would, indeed, be considered innumerate. (Could they do their taxes? Split a bill? Make change? Determine which of two products where one is per-lb and the other is per-each has the better value? No? Then they can't get by in society. That's innumeracy!)

---

But also — to pop the context here: we're not talking about GPT. We're talking about a different language model (Llama 2), that's very likely strictly worse than any of the GPT models are at math (though I'd be intrigued to be proven wrong.) I assert this because, as I said above, I believe that GPT is as numerate as it is because of its visual sensory field — which the Llama models don't have. Thus my initial assertion: if even a multi-modal language model like GPT isn't close to full numeracy, then a pure language model has no chance at even vaguely simulating numeracy. And that that's why the OP is seeing the errors they're seeing.

j / k navigate · click thread line to collapse