undefined | Better HN

0 pointsnyrikki3y ago0 comments

The paper from yesterday:

https://news.ycombinator.com/item?id=36332033

Showed that attention with positional encodings and arbitrary precision rational activation functions is Turing complete.

Using a finite precision, nonrational activation function and/or without positional encodings is not Turning complete.

Plus Turing completeness does not tell you anything about practical computation in reasonable time or space constraints.

printf() format strings are TC, and while interesting, probably won't help you solve real problems.

0 comments

3 comments · 1 top-level

HarHarVeryFunny3y ago· 2 in thread

Maybe, but a CPU is also Turing-complete yet a (for example) sort program running on a CPU is just a sort program. The functionality of an LLM is defined by whatever it learnt during it's (dataset-specific) training, even if that includes in-context and one-shot "learning".

You could train a Turing-complete transformer to do a different task than running an LLM, but once you've trained it to run/be an LLM, then that is what it is.

nyrikkiOP3y ago

A CPU is a finite state machine, so adding an unbounded tape is trivial to make a theoretical TC.

The arbitrary precision activation function and position requirements are to keep the attention dynamic reweighting values in the computable set.

As even multi layer neural networks use the shifting, reflection and sum of line segments to produce their curve, the results of those operations may not map to representable numbers even given unbounded digits when using typical activation functions.

Using an activation function that keeps results in aleph-nought, or a countable infinity is what allows for it to be TC.

Probably Approximately Correct or PAC learning is intentionally fuzzy.

The occasional gradant loss problem with ReLU is possibly a lens to think about this in.

But the success of statistical learning in the past 30 years has been largely related to having existential quantifiers with acceptable training loss. Following the very useful concept from stats that all models are wrong but some are useful.

Transformer models will most definitely be useful for some problems, assuming that a physically unrealizable configuration is TC will hold will lead to wasted efforts.

Simply acknowledging the potential dead ends of a technology helps with not only choosing the right path but recognizing early that you need to change course.

IMHO, this posts papers method as a lens is far more useful as an intuition.

inciampati3y ago

Beautiful synopsis, thank you!

j / k navigate · click thread line to collapse