Someone out there is probably experimenting with something world-changing, and has all the ingredients except for a few more iterations of Moore's Law. It would feel a lot like working on deep learning in 1990. If you think you might be on this path, it's worth studying the history.
These are basically the only outliers who claim that automatic differentiation was invented by Linnainmaa alone. Many people invented AD at the same time, and Linnainmaa was not the first. Simply naming one person is a huge disservice to the community and shows that this is just propaganda, as much of Schmidhuber's stuff is.
1. First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991. This dates back at least to Fukushima for the theory and LeCun in 1989 practically.
2. Compressing / Distilling one NN into Another. Lots of people did this before 1991.
3. The Fundamental Deep Learning Problem: Vanishing / Exploding Gradients. They did publish an analysis of this, that's true.
4. Long Short-Term Memory (LSTM) Recurrent Networks. No, this was 1997.
5. Artificial Curiosity Through Adversarial Generative NNs. Absolutely not. Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. on Systems, Man, and Cybernetics, (5):834– 846, 1983
6. Artificial Curiosity Through NNs That Maximize Learning Progress (1991) I have nothing to say to this. This isn't something that worked back in 1991 and it's not something that works today.
7. Adversarial Networks for Unsupervised Data Modeling (1991) This isn't the same idea as GANs. The idea as presented in the paper doesn't work.
8. End-To-End-Differentiable Fast Weights: NNs Learn to Program NNs (1991). Already existed and the idea as presented in the original paper doesn't work.
9. Learning Sequential Attention with NNs (1990). He uses the word attention but it's not the same mechanism as the one we use today which dates to 2010. This did not invent attention in any way.
10. Hierarchical Reinforcement Learning (1990). Their 1990 does not do hierarchical RL, their 1991 paper does something like it. This is at least contemporary with Learning to Select a Model in a Changing World by Mieczyslaw M.Kokar and Spiridon A.Reveliotis.
Please stop posting the ravings on a person who is trying to steal other people's work.
Quoting the blog post:
"Of course, Deep Learning in feedforward NNs started much earlier, with Ivakhnenko & Lapa, who published the first general, working learning algorithms for deep multilayer perceptrons with arbitrarily many layers back in 1965 [DEEP1]. For example, Ivakhnenko's paper from 1971 [DEEP2] already described a Deep Learning net with 8 layers, trained by a highly cited method still popular in the new millennium [DL2]."
Let's try to be fair and objective here. You may have an axe to grind with Schmidhuber, but that does not give you the right to take things out of context.
Well, now it seems that you lie here since you avoid the well-known fact that Schmidhuber attributes the first deep NNS back to the 60ies and 70ies. The same goes for many of your other points.
- lowering the price of each chip - you can get that by more automation.
- lowering the cost of energy used by a chip - you can have that by raise of renewable energy generation and its decentralisation (and again, more automation).
The point is that automation caused by AI will start a reinforcing feedback loop where more and more work can be done more cheaply, speeding up automation itself too.
This whole account has virtually zero mention of how later techniques improved upon or innovated on his, and very little account of how his contributions were (like everyone else's) evolutions of existing methods. It reads almost like Schmidhuber or his students invented and solved everything from scratch, and nobody else has done shit since.
The guy clearly wants to be more included in the standard narrative, but being so self aggrandizing is doing him zero favors. If were capable of writing an honest, charitable account of how his work fits into a much larger field, it would be much easier to take him more seriously.
Not everyone likes that article either, but it does at least extensively cite prior work, i.e. accounts for "how his contributions were (like everyone else's) evolutions of existing methods". In particular, sections 5.1–5.4 credit a large amount of work from the 1960s-80s that he considers foundational.
This kind of rhetoric was partly responsible for why he didn't receive the Turing award last year which he thoroughly deserved. We seem incapable of appreciating achievements of people who don't match our ideal of personality type.
[0] https://www.reddit.com/r/MachineLearning/comments/5go4sa/n_w...
The Turing Award has been awarded every year (usually to multiple people) since 1966.
Look it up on Wikipedia. How many laureates of the 70 can you find who performed their research outside of the Angloshpere? I didn't look in detail, but after a quick glance it seems about 5 out of the 70 (Daal, Nygaard, Shamir, Naur, Sifakis)? (Or how many who grew up outside the Anglosphere?)
Maybe that reflects the true state of things and almost all of CS was developed in the Angloshpere. Even if that's so historically, I think it may induce some bias when evaluating people's contribution from outside the Anglo community and network.
We had also some relatively sophisticated tools, and looking back in time one could say they were deep-learning-ish. In my personal case I did some research for weather forecasting using BPN/TDNN, Kohonen and RNNs with the Stuttgart Neural Network Simulator [0]. It allowed some flexibility creating and stacking models.
After a few years the three (post-docs) left and founded a startup. I lost contact with them. I think they were too early for broader applications, and had left the field completely in the early 2000's, when it really took of.
Here is a book that the author of the referenced article , and the people from my group (Utrecht University), contributed to: https://link.springer.com/book/10.1007%2F978-1-4471-0877-1
Incredible to think how much amazing research was happening back then and wonder what research is being done now that will change our lives in the next 30 years.
Even if you disagree with Schmidhuber's assessment of his own importance, I think this is clearly true.
There is a certain arrogance (or not-invented-here syndrome) in the Anglosphere (or North America) towards research done elsewhere.
As I write this, I am looking at the book "Parallel and Distributed Processing", (with the blue cover) an edited compilation of papers on neural networks published by the MIT Press in 1987. I myself spent the summer of 1990 implementing the back-propagation algorithm as described in chapter 8 of this book which is entitled "learning Internal Representations" by Rumelhart, Hinton and Williams.
I myself got my PhD in 1992 for coming up with an algorithm for speeding up back-propagation when the training set is imbalanced.
An Improved Algorithm for Neural Network Classification of Imbalanced Training Sets. November 1993IEEE Transactions on Neural Networks 4(6):962 - 969