Attention Is All You Need (Neural Networks) (opens in new tab)

(arxiv.org)

8 pointsidibidiartists9y ago3 comments

3 comments

3 comments · 1 top-level

rerx9y ago· 2 in thread

This is super interesting. I believe the general expectation was that convolutional neural networks would soon surpass recurrent neural networks in machine translation tasks, but this is an entirely novel approach.

visarga9y ago

This, and graph based neural nets are very different from CNN and LSTM. They learn to split a scene into objects and then learn how they interact. In this way a lot of variation in the input is factorized out and only relations between compatible types of objects are learned. It leads to stronger generalization.

If you think about it, when we are going to do full reasoning, how is the data to be represented? Embeddings and flat lists/matrices are not appropriate for the way objects interrelate. It has to be a kind of graph. Here they used multiple attentions instead, which kind-of work the same way as graphs, attention heads being similar to links between objects.

Once we have data represented as graphs we can also do simulation - we apply the rules of each object iteratively on the graph. The graph can be seen as an automata, where each object updates its state by integrating information from its neighbors. Automata are general Turing machines - they can represent and simulate any computation. With simulation we can do optimal solutions search. It opens a lot of doors for AI.

My money is on simulation and graphs for the next level of AI.

gmitscha9y ago

I do not think graphs is where we're heading. I think flat vectors are fine, and I would argue multi-head attention is not THAT different from gated RNNs like LSTM. The multiplication with weights, which are the outcome of a softmaxed dot-product, is similar to the input gate of LSTM.

j / k navigate · click thread line to collapse