undefined | Better HN

0 pointsYeGoblynQueenne3y ago0 comments

Your model is not sorting correctly and it sure has not learned any "algorithm". At best it has learned to approximate a sorting algorithm. That's what statistical machine learning models do, they are function approximators; not program learners.

Also, Machine Learning 101: you test your models on a test set that is disjoint to the training set. To clarify, we do this not because it's in the book and that's the rules, but because, by testing the model on held-out data, we can predict the error the model will have on unseen data (i.e. data not available to the experimenter). And we do this because under PAC-Learning assumptions a learner is said to learn a concept when it can correctly label instances of the concept with some probability of some error. In real-world situations we do not know the true concept, so we test on held-out data to approximate the probability of error.

Bottom line, if you train a model to do a thing and you don't test it carefully to figure out its error, you might claim it's learned something, but in truth, you have no idea what it's learned.

(To clarify: you tested on the train data assuming there's a low probability of overlap. Don't do that if you're trying to understand what your models can do).

0 comments

3 comments · 1 top-level

jbay8083y ago· 2 in thread

> it sure has not learned any "algorithm". At best it has learned to approximate a sorting algorithm. That's what statistical machine learning models do, they are function approximators; not program learners.

Transformers are RASP programs, which includes sorting programs. See the Weiss paper (https://arxiv.org/pdf/2106.06981.pdf).

> Also, Machine Learning 101: you test your models on a test set that is disjoint to the training set. To clarify, we do this not because it's in the book and that's the rules, but because, by testing the model on held-out data, we can predict the error the model will have on unseen data

The probability of a test list existing in the training set is less than 10^-70.

YeGoblynQueenneOP3y ago

>> Transformers are RASP programs, which includes sorting programs. See the Weiss paper (https://arxiv.org/pdf/2106.06981.pdf).

That's one preprint on arxiv, that makes a wild claim about a new concept that they acronymise as "RASP". It's not any kind of established terminology, nor is it anything but a claim.

What is certainly established is that a function, and an algorithm, are different objects. To clarify, a function is a mapping between the elements of two sets, whereas an algorithm is a sequence of operations that calculates the result of a function and is guaranteed to terminate. Algorithms are also typically understood to be provably correct and to have some provable asymptotic complexity (as opposed, for example, to heuristics) but that's not a requirement.

So for example, if you have a function ƒ between sets X and Y, and an algorithm P that calculates the result of ƒ, then you can give any element of X to P and it will return (in fact, construct) an element of Y. Crucially, ƒ is not P, and P is not ƒ.

Now, when you train a machine learning model, you are typically training a function ƒ̂ (with a little hat) to approximate ƒ. That means that your trained ƒ̂ is a function that maps some of the elements of X to the same elements of Y as ƒ, but not all. It's an approximation. So you get some amount of error, as in your experiment.

So what you've done in your experiment is that you trained a model to approximate a mapping between the set of lists, to itself (where the input list is any of the lists in your training set and the output is the same list, sorted). Your model is not an algorithm, and you cannot train an algorithm with a language model.

I appreciate that, learning an algorithm, is what you wanted to achieve, but in science we don't choose the answer that pleases us, we choose the answer that makes the most sense- and a good heuristic for that is that the answer that makes more sense is the simplest one. Here, in order to convince yourself that you have trained a language model to learn an algorithm, rather than an approximator, you have chosen to rely on a preprint with a completely novel and untested concept that someone put on the internet, rather than the well-understood abstractions of elementary computer science, so not at all the simplest explanation. That is not a good idea. You will not understand what is going on, if you rely on that kind of explanation. I assume you are trying to understand?

Edit: incidentally, you don't need a transformer to train an approximator to a sorting function. You can do that with a multi-layer perceptron, or a logistic regression, certainly with an LSTM. Ceteris paribus, you'll get the same results.

>> The probability of a test list existing in the training set is less than 10^-70.

But the same probability if you held the test set out would be 0, so why not do that? It's not hard to do.

Is there a good reason not to do that?

Btw, lists are composite objects. How much overlap is there between your training and test lists? Do you know?

Edit: meh. HN messes up my nice f-with-hook-and-combining-circumflex-accent. DAAAAANG!!!!

jbay8083y ago

> that's one preprint on arxiv, that makes a wild claim about a new concept that they acronymise as "RASP". It's not any kind of established terminology, nor is it anything but a claim.

Would you change your mind for a different link, like this one? http://proceedings.mlr.press/v139/weiss21a.html

I think you would enjoy learning about RASP, rather than taking such a hardline skeptical position.

> a function is a mapping between the elements of two sets, whereas an algorithm is a sequence of operations that calculates the result of a function and is guaranteed to terminate

I'm aware. Transformers (and RASP programs) are guaranteed to terminate; that's one of their nice properties.

> Is there a good reason not to do that?

Balanced against the value of my unpaid time, a probability of 10^-70 is low enough for the purposes of a quick and fun test.

Speaking of which, I'm going to enjoy my weekend now. I hope you enjoy yours!

1 more reply

j / k navigate · click thread line to collapse