Deep Learning 101 (opens in new tab)

(markus.com)

210 pointsmbeissinger12y ago44 comments

44 comments

35 comments · 11 top-level

brandonb12y ago· 5 in thread

This is a cool tutorial!

It's ironic that deep neural networks have become the biggest machine learning breakthrough of 2013: they were also the biggest machine learning breakthrough of 1957. The idea dates back to the Perceptron, one of the oldest ideas in AI.

One thing to note: although there was a lot of initial excitement about Restricted Boltzman Machines, Auto-encoders, and other unsupervised approaches, the best results in the last year or so have all used conventional the back-propagation algorithm from 1974, with a few tweaks. http://en.wikipedia.org/wiki/Backpropagation

Ben Lorica wrote a good article on the latest deep learning research from Google, and what's changed since neural networks were last popular in the 1980's: http://strata.oreilly.com/2013/10/deep-learning-oral-traditi...

What's old is new again.

rm99912y ago

The history of AI is really interesting. Perceptrons were extremely oversold by their inventor, Frank Rosenblatt after he introduced then in 1958. This led to a lot of funding and interest in AI and perceptrons. Then, in 1969, Marvin Minsky coauthored a book Perceptrons which harshly criticized how underpowered perceptrons were. Most famously, the book proved that a perceptron could not model a simple XOR function. In other words, a technique that many had been led to believe would one day emulate human-like intelligence couldn't even emulate a dead-simple logic gate! The book was devastating and effectively led to a dark age of AI where funding and interest dried up (later, the term "AI winter" was coined).

The next big boom in AI (ignoring some logic/rules-based research in the 70s that I don't think is very interesting from an AI perspective) occurred in the 80s, when computational power increased and researchers discovered/rediscovered neural approaches, including the obscure 1974 research on backpropagation. This led to tons of press and funding from governments who dreamed of killer AI robots and what-not. But, once again, imagination raced ahead of reality and funding dried up when said robots didn't materialize. The field didn't really die off, but funding in AI went way down, leading to another major "AI winter".

I'd say the next big era of AI is the one we're in, driven largely by applied statistics that became known as "machine learning". This has been by far the most successful era, and has probably added 100s of billions of dollars to the economy (I'd argue Google is a machine learning company, for example). I think it's also the most pragmatic era, as people in the field have really learned from the past mistakes of overpromising. In fact, when I was studying "AI" in grad school, my professors warned me to always refer to what I did as machine learning because the concept of "intelligence" was such a joke to so many in the field.

rahimiali12y ago

Signal processing mysticism repeats itself every 20 years and has been fueled by tremendous hype since its debut 400 years ago:

1. Linear Regression (which, admittedly, was amazing)

2. Fourier Analysis (which is linear regression on orthonormal bases of functions. it blew people's minds)

3. Perceptrons (which is linear regression but with a logistic loss. it went back to its old name of "logistic regression" once its insane cachet of biological plausibility faded)

4. Neural Networks (stack of logistic regressors. popular with people who didn't know how to filter their inputs through fixed or random nonlinearities before applying linear regression)

5. Self Organizing Maps and Recurrent Nets (which were neural nets that feed back on themselves)

6. Fractals (which is recursion. they were useful for enticing children into math classes)

7. Chaos (which is recursion that's hard to model. useful for movie plots)

8. Wavelets (which is recursive Fourier analysis, and probably still way under-used)

9. Support Vector Machines (which replaces logistic regression's smooth loss with a kink that makes it hard to use a fast optimizer. often conflated with the "kernel trick", which appealed to people who didn't want to pass their inputs through nonlinearities explicitly)

9. Deep Nets (which are bigger neural networks. the jury's out whether they work better because they're deeper, or because they're bigger and require a lot of data to train, or because they require a programmer to spend years developing a learning algorithm for each new dataset. also whether they do actually work better).

Once this Deep Net thing blows over again, my money's on Kernelized Recurrent Deep Self Organizing Maps.

(On a serious note: MNIST is considered a trivial dataset and doesn't require the heavy machinery of deep nets. linear regression on almost any random nonlinearity applied to the data (say f(x;w,t)=cos(w'x+t) with w~N(0,I) and t~U[0,pi2/]) will get you >98% accuracy on MNIST.)