undefined | Better HN

0 pointsnaasking3y ago0 comments

> I must have missed the part when it started doing anything algorithmically. I thought it’s applied statistics, with all the consequences of that.

This is a common misunderstanding. Transformers are actually Turing complete:

* On the Turing Completeness of Modern Neural Network Architectures, https://arxiv.org/abs/1901.03429

* On the Computational Power of Transformers and its Implications in Sequence Modeling, https://arxiv.org/abs/2006.09286

0 comments

stefl143y ago

Turing Completeness is an incredibly low bar and it doesn't undermine this criticism. Conway's Game of Life is Turing Complete, but try writing modern software with it. That Transformers can express arbitrary programs in principle doesn't mean SGD can find them. Following gradients only works when the data being modelled lies on a continuous manifold, otherwise it will just give a statistical approximation at best. All sorts of data we care about lie in topological spaces with no metric: algorithms in computer science, symbolic reasoning in math, etc. If SGD worked for these cases LLMs would push research boundaries in maths and physics or at the very least have a good go at Chollet's ARC challenge, which is trivial for humans. Unfortunately, they can't do this because SGD makes the wrong assumption about how to search for programs in discrete/symbolic/topological spaces.

naaskingOP3y ago

> Turing Completeness is an incredibly low bar and it doesn't undermine this criticism.

It does. "Just statistics" is not Turing complete. These systems are Turing complete, therefore these systems are not "just statistics".

> or at the very least have a good go at Chollet's ARC challenge, which is trivial for humans.

I think you're overestimating humans here.

j / k navigate · click thread line to collapse