undefined | Better HN

0 pointsrahimiali12y ago0 comments

Signal processing mysticism repeats itself every 20 years and has been fueled by tremendous hype since its debut 400 years ago:

1. Linear Regression (which, admittedly, was amazing)

2. Fourier Analysis (which is linear regression on orthonormal bases of functions. it blew people's minds)

3. Perceptrons (which is linear regression but with a logistic loss. it went back to its old name of "logistic regression" once its insane cachet of biological plausibility faded)

4. Neural Networks (stack of logistic regressors. popular with people who didn't know how to filter their inputs through fixed or random nonlinearities before applying linear regression)

5. Self Organizing Maps and Recurrent Nets (which were neural nets that feed back on themselves)

6. Fractals (which is recursion. they were useful for enticing children into math classes)

7. Chaos (which is recursion that's hard to model. useful for movie plots)

8. Wavelets (which is recursive Fourier analysis, and probably still way under-used)

9. Support Vector Machines (which replaces logistic regression's smooth loss with a kink that makes it hard to use a fast optimizer. often conflated with the "kernel trick", which appealed to people who didn't want to pass their inputs through nonlinearities explicitly)

9. Deep Nets (which are bigger neural networks. the jury's out whether they work better because they're deeper, or because they're bigger and require a lot of data to train, or because they require a programmer to spend years developing a learning algorithm for each new dataset. also whether they do actually work better).

Once this Deep Net thing blows over again, my money's on Kernelized Recurrent Deep Self Organizing Maps.

(On a serious note: MNIST is considered a trivial dataset and doesn't require the heavy machinery of deep nets. linear regression on almost any random nonlinearity applied to the data (say f(x;w,t)=cos(w'x+t) with w~N(0,I) and t~U[0,pi2/]) will get you >98% accuracy on MNIST.)

0 comments

3 comments · 2 top-level

nrmn12y ago· 1 in thread

Could you explain the filtering "their inputs through fixed or random nonlinearities"? I haven't heard of this before.

rahimialiOP12y ago

you've actually probably done this yourself. it's often called "featurization". for example, instead of applying a linear learner on vectors x in R^d, you apply it to vectors f(x), where f computes a bunch of features on x. a popular choice for f are the d-th order monomials. hashing families are another good idea (Alex Smola does this). more generally, any random nonlinear function f is a good candidate (i call that analysis "Random Kitchen Sinks"). when x is structured data, f usually just returns counts in histogram bins of some kind.

cheesycheese12y ago

There are some gross simplifications here: 8. Wavelets are not "recursive Fourier analysis". If you want to make it simple, it's more like a spatially localized Fourier extension. I agree that they are under-used though. 9. SVM: the "kernel trick" is a big deal because sometimes defining a vector-space of linear features out of non-vector objects won't give you a good performance, and you'd better define a dot-product. 10. Deep Nets are not only bigger neural networks. The buzz is about how you train them. It's about improving how you train a net in general.

I would say that the next bing thing is more: Realizing even more that Neural Nets is an optimization problem, and instead of using some heuristics, wait for some Russian mathematician to derive the right SGD schedule / batch solver for the problem. Then what the 1,000 of Google computers have been able to do for the cat face detector, we'll be able to do it on a smartphone chip. People have to realize that Deep Learning is a bit of a "brute force" solution for the moment (each node is a linear model). We need to derive smarter algorithms.

j / k navigate · click thread line to collapse