Deep Learning: Our Miraculous Year 1990-1991 (opens in new tab)

(people.idsia.ch)

156 pointseugenhotaj6y ago34 comments

34 comments

28 comments · 11 top-level

tlb6y ago· 8 in thread

I encourage reading this, not as self-promotion, but as a first-person history of what it feels like to be too early with a technology.

Someone out there is probably experimenting with something world-changing, and has all the ingredients except for a few more iterations of Moore's Law. It would feel a lot like working on deep learning in 1990. If you think you might be on this path, it's worth studying the history.

light_hue_16y ago

Definitely don't read it as a history. It's just a lie. Schmidhuber is laying claim to a lot of things he didn't do. And is taking anything that kind of relates in words to modern techniques and claiming that he invented the technique. Even if practically his papers have nothing to do with what the words mean today and had no influence on the field.

These are basically the only outliers who claim that automatic differentiation was invented by Linnainmaa alone. Many people invented AD at the same time, and Linnainmaa was not the first. Simply naming one person is a huge disservice to the community and shows that this is just propaganda, as much of Schmidhuber's stuff is.

1. First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991. This dates back at least to Fukushima for the theory and LeCun in 1989 practically.

2. Compressing / Distilling one NN into Another. Lots of people did this before 1991.

3. The Fundamental Deep Learning Problem: Vanishing / Exploding Gradients. They did publish an analysis of this, that's true.

4. Long Short-Term Memory (LSTM) Recurrent Networks. No, this was 1997.

5. Artificial Curiosity Through Adversarial Generative NNs. Absolutely not. Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. on Systems, Man, and Cybernetics, (5):834– 846, 1983

6. Artificial Curiosity Through NNs That Maximize Learning Progress (1991) I have nothing to say to this. This isn't something that worked back in 1991 and it's not something that works today.

7. Adversarial Networks for Unsupervised Data Modeling (1991) This isn't the same idea as GANs. The idea as presented in the paper doesn't work.

8. End-To-End-Differentiable Fast Weights: NNs Learn to Program NNs (1991). Already existed and the idea as presented in the original paper doesn't work.

9. Learning Sequential Attention with NNs (1990). He uses the word attention but it's not the same mechanism as the one we use today which dates to 2010. This did not invent attention in any way.

10. Hierarchical Reinforcement Learning (1990). Their 1990 does not do hierarchical RL, their 1991 paper does something like it. This is at least contemporary with Learning to Select a Model in a Changing World by Mieczyslaw M.Kokar and Spiridon A.Reveliotis.

Please stop posting the ravings on a person who is trying to steal other people's work.

1024core6y ago

> 1. First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991. This dates back at least to Fukushima for the theory and LeCun in 1989 practically.

Quoting the blog post:

"Of course, Deep Learning in feedforward NNs started much earlier, with Ivakhnenko & Lapa, who published the first general, working learning algorithms for deep multilayer perceptrons with arbitrarily many layers back in 1965 [DEEP1]. For example, Ivakhnenko's paper from 1971 [DEEP2] already described a Deep Learning net with 8 layers, trained by a highly cited method still popular in the new millennium [DL2]."

Let's try to be fair and objective here. You may have an axe to grind with Schmidhuber, but that does not give you the right to take things out of context.

account734666y ago

"First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991."

Well, now it seems that you lie here since you avoid the well-known fact that Schmidhuber attributes the first deep NNS back to the 60ies and 70ies. The same goes for many of your other points.

throwawayjava6y ago

Do we have a few more iterations of Moore's law?

Agebor6y ago

Even if we don't, the progress is not going to stop, for example on:

- lowering the price of each chip - you can get that by more automation.

- lowering the cost of energy used by a chip - you can have that by raise of renewable energy generation and its decentralisation (and again, more automation).

The point is that automation caused by AI will start a reinforcing feedback loop where more and more work can be done more cheaply, speeding up automation itself too.

2 more replies

tlb6y ago

Not for clock speed, but yes for parallelism. It might look like the Cerebras [0] wafer-scale monster becoming a commodity you could fire up 1000 of in the cloud.

[0] https://www.cerebras.net/

1 more reply

deepnotderp6y ago

We've got a few more iterations of Moore's law for sure. After that progress will likely happen in jumps and address non-xtor bottlenecks like memory access. E.g. wafer scale integration, 3D systems, photonics, etc.

realbarack6y ago

For what it's worth I interpret the GP's statement as referring to general foundational progress in whatever field, not Moore's law specifically.

shmageggy6y ago· 3 in thread

God, Schmidhuber is insufferable.

This whole account has virtually zero mention of how later techniques improved upon or innovated on his, and very little account of how his contributions were (like everyone else's) evolutions of existing methods. It reads almost like Schmidhuber or his students invented and solved everything from scratch, and nobody else has done shit since.

The guy clearly wants to be more included in the standard narrative, but being so self aggrandizing is doing him zero favors. If were capable of writing an honest, charitable account of how his work fits into a much larger field, it would be much easier to take him more seriously.

mjn6y ago

I mean it's self-promotional yes, but I read this as more of a blog post about advances specifically in his own group. For the Schmidhuberian take on the broader history of deep learning, this other one's the go-to article (though it's much longer): https://arxiv.org/abs/1404.7828

Not everyone likes that article either, but it does at least extensively cite prior work, i.e. accounts for "how his contributions were (like everyone else's) evolutions of existing methods". In particular, sections 5.1–5.4 credit a large amount of work from the 1960s-80s that he considers foundational.

nafizh6y ago

Really? The title itself says, "Deep Learning: Our Miraculous Year 1990-1991". It's an account of their work during that year. And he cites like a 100 articles there.

This kind of rhetoric was partly responsible for why he didn't receive the Turing award last year which he thoroughly deserved. We seem incapable of appreciating achievements of people who don't match our ideal of personality type.

account734666y ago

There is a subset of people who does not like Schmidhuber. According to my personal observations, this subset overlaps quite a lot with people who tend to underestimate the importance of proper credit assignment.

KKKKkkkk16y ago· 3 in thread

It seems that Schmidhuber is claiming credit for deep learning and is implicitly comparing himself to Albert Einstein. How accurate is his assessment?

goldemerald6y ago

My goal is to one day have Schmidhuber angrily claim that my research was done by him in the 90s like what happened to Ian Goodfellow [0].

[0] https://www.reddit.com/r/MachineLearning/comments/5go4sa/n_w...

account734666y ago

Schmidhuber was more right than wrong

1 more reply

freyr6y ago

That is Schmidhuber in a nutshell.

nafizh6y ago· 2 in thread

It was a travesty Schmidhuber didn't receive the Turing award along with Hinton, Lecun, and Bengio last year.

bonoboTP6y ago

It does seem to me that there could be some bias in this award's history.

The Turing Award has been awarded every year (usually to multiple people) since 1966.

Look it up on Wikipedia. How many laureates of the 70 can you find who performed their research outside of the Angloshpere? I didn't look in detail, but after a quick glance it seems about 5 out of the 70 (Daal, Nygaard, Shamir, Naur, Sifakis)? (Or how many who grew up outside the Anglosphere?)

Maybe that reflects the true state of things and almost all of CS was developed in the Angloshpere. Even if that's so historically, I think it may induce some bias when evaluating people's contribution from outside the Anglo community and network.

Ormus6y ago

No it wasn't. Stop forming opinions from uninformed internet memes.

jumpingmice6y ago· 1 in thread

The prominent developers of deep learning techniques within google were quite upfront that they were applying old techniques that had not been practical until massive datacenters expanded the parameter space and training power.

nullc6y ago

Have they been equally upfront in their patent applications?

pjbk6y ago

I guess many institutions and research groups can write similar accounts. Even the late 80s were somewhat productive concerning NNs and what today we call ML, just by searching publications of that era.

We had also some relatively sophisticated tools, and looking back in time one could say they were deep-learning-ish. In my personal case I did some research for weather forecasting using BPN/TDNN, Kohonen and RNNs with the Stuttgart Neural Network Simulator [0]. It allowed some flexibility creating and stacking models.

[0] http://www.ra.cs.uni-tuebingen.de/SNNS/welcome.html

plmu6y ago

One of the early applications was pattern matching for LHC. I was in one of the groups in which some (not myself) worked on this and put the neural networks, using the just developed theory, in hardware with FPGA's.

After a few years the three (post-docs) left and founded a startup. I lost contact with them. I think they were too early for broader applications, and had left the field completely in the early 2000's, when it really took of.

Here is a book that the author of the referenced article , and the people from my group (Utrecht University), contributed to: https://link.springer.com/book/10.1007%2F978-1-4471-0877-1

alexcnwy6y ago

1989/1990 was also when convolutional networks first started working with LeCun’s breakthrough paper on digit recognition.

Incredible to think how much amazing research was happening back then and wonder what research is being done now that will change our lives in the next 30 years.

bonoboTP6y ago

> In surveys from the Anglosphere it does not always become clear [DLC] that Deep Learning was invented where English is not an official language.

Even if you disagree with Schmidhuber's assessment of his own importance, I think this is clearly true.

There is a certain arrogance (or not-invented-here syndrome) in the Anglosphere (or North America) towards research done elsewhere.

2sk216y ago

As an old-timer in neural networks, this was interesting. However I should note that we did not call it "deep learning" back then. It was simply "neural networks".

As I write this, I am looking at the book "Parallel and Distributed Processing", (with the blue cover) an edited compilation of papers on neural networks published by the MIT Press in 1987. I myself spent the summer of 1990 implementing the back-propagation algorithm as described in chapter 8 of this book which is entitled "learning Internal Representations" by Rumelhart, Hinton and Williams.

I myself got my PhD in 1992 for coming up with an algorithm for speeding up back-propagation when the training set is imbalanced.

An Improved Algorithm for Neural Network Classification of Imbalanced Training Sets. November 1993IEEE Transactions on Neural Networks 4(6):962 - 969

kerng6y ago

This is pretty cool. Always interesting to see how things eventually become mainstream whereas origins go back decades, sometimes more.

j / k navigate · click thread line to collapse