Without reproducibility and transparency in the code and data, the impact of this research is ultimately limited. No one else can recreate, iterate, and refine the results, nor can anyone rigorously evaluate the methodology used (besides giving a guess after reading a manuscript).
The year is 2019, many are finally realizing it's time to back up your results with code, data, and some kind of specification of the computing environment you're using. Science is about sharing your work for others in the research community to build upon. Leave the manuscript for the pretty formality.
Given that it's evolved I'd imagine this is a given? Or more accurately you could probably duplicate some kind of emergent behaviour but it would be different given different randomized parameters
IMHO this is probably just a case of them trying to stretch this out across a bunch of different papers, and this is just the announce paper. Which is a shitty practice, but the current academic environment encourages taking good findings and puffing them up into multiple incomplete papers rather than one well-done paper.
Still a long way from a Theory of Biogenesis. But a good next step is using a differentiable model to predict novel proteins which have no analogue in Nature. Much like Materials Genome researchers searching for stable phases of matter!
"Training ever bigger convnets and LSTMs on ever bigger datasets gets us closer to Strong AI -- in the same sense that building taller towers gets us closer to the moon." --François Chollet
The Transformer layer has radically leaped over LSTMs and CNNs. While LSTMs can model sequences and CNNs regular grids, they have no efficient long range interaction mechanism. Transformer does. It's a huge leap similar to the one in computer vision from a few years ago.
What is needed besides spatial translation invariance (CNN) and temporal invariance (LSTM) is permutation invariance. Whenever the problem can be described as a graph, then the ordering of the vertices and edges should not matter. You can't do that with CNNs and LSTMs, but you can do it with Graph neural nets and Transformers.
Apparently Transformers are the best for language modelling (GPT-2), playing games (Dota2 from OpenAI), composing music and possibly now in modelling proteins. I assume they will play a huge role in working with graph structured data, with multiple entities and relations.
Transformers work well in sequence tasks because both compare well in terms of accuracy but also scale better than a RNNs like a LSTM or a GRU. That means they can be trained on more data.
This isn't really the same as CNNs, where they model images by running at different scales. I'm not aware of any cases of Transformers being used particularly successfully on images.
They can be used on graphs of course, by translating the problem into a graph walk problem (ala DeepWalk).
All the examples you gave (language modelling, Dota2, music and protein modelling) are setup as sequence prediction problems, so are perfect for Transformers.
I'm having a hard time processing what the wink might possibly mean in this context.
No sarcasm intended.
(Yann LeCun is a Turing award winner for his work in deep learning)
I hate to be that guy, but distinguishing between alpha helices and beta strands is not really that hard.
It's a good start though. I would propose the following test: Let's see if we can use the activations from the neurons to predict the luminosity of a 'base' GFP molecule (under a fixed set of experimental conditions). Train the set on 10,000 mutations (this could maybe be done in very high throughput by tethering the XNA to a bead, synthesizing, and then measuring the beads one by one), and see if can extrapolate the effects of 10k more, or heck, just by doing it brute-forcedly, we've got high throughput robots, right?
There are probably already enzymes in this data set that have measurements of their behavior. Could this modelling approach be coaxed to find the one with the highest processivity? Or do we need more labeled data?
Which is a shame, because it's a reasonable approach. I just wish they just frickin described what they did instead of spending the whole paper monologuing and showcasing unconvincing experiments. No need to justify what you're doing, just do it.
A couple of questions:
1. What are those representations?
2. Also what is "biological function"?
3. What kind of information does the learned representation extract that is not already in the "biological properties" it is trained to map to?
Very impressive accuracies on hard tasks, and it's open source!