In this case, they prove that the model works by categorising inputs into a number of binary classes which just happen to be very good predictors for this otherwise random seeming sequence. I don't know whether or not some of these binary classes are new to mathematics but either way, their technique does show that transformer models can be helpful in uncovering mathematical patterns even in functions that are not continuous.
Besides, we're all stuck on the 99.7% as if that's the across the board output, but that's a cherry picked result:
"The best models (bases 24, 16 and 32) achieve a near-perfect accuracy of 99.7%, while odd-base models struggle to get past 80%."
I do think it is a very interesting thing to do with a model and it is impressive that it works at all.
The problem here is deterministic. *It must be for accuracy to even be measured*.
The model isn't trying to solve the Collatz conjecture, it is learning a pretty basic algorithm and then doing this a number of times. The instructions it needs to learn is
if x % 2:
x /= 2
else:
x = x*3 + 1
It also needs to learn to put that in a loop and for that to be a variable, but the algorithm is static.On the other hand, the Collatz conjecture states that for C(x) (the above algorithm) has a fixed point of 1 for all x (where x \in Z+). Meaning that eventually any input will collapse to the loop 1 -> 4 -> 2 -> 1 (or just terminate at 1). You can probably see we know this is true for at least an infinite set of integers...
Edit: I should note that there is a slight modification to this, though model could get away with learning just this. Their variation limits to odd numbers and not all of them. For example 9 can't be represented by (2^k)m - 1 (but 7 and 15 can). But you can see that there's still a simple algorithm and that the crux is determining the number of iterations. Regardless, this is still deterministic. They didn't use any integers >2^71, which we absolutely know the sequences for and we absolutely know all terminate at 1.
To solve the Collatz Conjecture (and probably win a Fields Metal) you must do one of 2 things.
1) Provide a counter-example
2) Show that this happens for all n, which is an infinite set of numbers, so this strictly cannot be done by demonstration.But now imagine that instead of it being a valid reject 0.3% of the time it would also reject valid primes. Now it would be instantly useless because it fails the test for determinism.
Now I get your point that a function that is 99.7% accurate will eventually always be incorrect but that's not what the comment said.
LLMs are not calculators. If you want a calculator use a calculator. Hell, have your LLM use a calculator.
>That's precisely why digital computers won out over analog ones, the fact that they are deterministic.
I mean, no not really, digital computers are far easier to build and far more multi-purpose (and technically the underlying signals are analog).
Again, if you have a deterministic solution that is 100% correct all the time, use it, it will be cheaper than an LLM. People use LLMs because there are problems that are either not deterministic or the deterministic solution uses more energy than will ever be available in the local part of our universe. Furthermore a lot of AI (not even LLMs) use random noise at particular steps as a means to escape local maxima.
I think they keep coming back to this because a good command of math underlies a vast domain of applications and without a way to do this as part of the reasoning process the reasoning process itself becomes susceptible to corruption.
> LLMs are not calculators. If you want a calculator use a calculator. Hell, have your LLM use a calculator.
If only it were that simple.
> I mean, no not really, digital computers are far easier to build and far more multi-purpose (and technically the underlying signals are analog).
Try building a practical analog computer for a non-trivial problem.
> Again, if you have a deterministic solution that is 100% correct all the time, use it, it will be cheaper than an LLM. People use LLMs because there are problems that are either not deterministic or the deterministic solution uses more energy than will ever be available in the local part of our universe. Furthermore a lot of AI (not even LLMs) use random noise at particular steps as a means to escape local maxima.
No, people use LLMs for anything and one of the weak points in there is that as soon as it requires slightly more complex computation there is a fair chance that the output is nonsense. I've seen this myself in a bunch of non-trivial trials regarding aerodynamic calculations, specifically rotation of airfoils relative to the direction of travel. It tends to go completely off the rails if the problem is non-trivial and the user does not break it down into roughly the same steps as you would if you were to work out the problem by hand (and even then it may subtly mess up).
Well that's great and all, but the vast majority of llm use is not for stuff you can just pluck out a pocket calculator (or run a similarly airtight deterministic algorithm) for, so this is just a moot point.
People really need to let go of this obsession with a perfect general intelligence that never makes errors. It doesn't and has never existed besides in fiction.
This is not even to mention the fact that asking a GPU to think about the problem will always be less efficient than just asking that GPU to directly compute the result for closed algorithms like this.
99.7% of the time good and 0.3% of the time noise is not very useful, especially if there is no confidence indicating that the bad answers are probably incorrect.