1. Linear Regression (which, admittedly, was amazing)
2. Fourier Analysis (which is linear regression on orthonormal bases of functions. it blew people's minds)
3. Perceptrons (which is linear regression but with a logistic loss. it went back to its old name of "logistic regression" once its insane cachet of biological plausibility faded)
4. Neural Networks (stack of logistic regressors. popular with people who didn't know how to filter their inputs through fixed or random nonlinearities before applying linear regression)
5. Self Organizing Maps and Recurrent Nets (which were neural nets that feed back on themselves)
6. Fractals (which is recursion. they were useful for enticing children into math classes)
7. Chaos (which is recursion that's hard to model. useful for movie plots)
8. Wavelets (which is recursive Fourier analysis, and probably still way under-used)
9. Support Vector Machines (which replaces logistic regression's smooth loss with a kink that makes it hard to use a fast optimizer. often conflated with the "kernel trick", which appealed to people who didn't want to pass their inputs through nonlinearities explicitly)
9. Deep Nets (which are bigger neural networks. the jury's out whether they work better because they're deeper, or because they're bigger and require a lot of data to train, or because they require a programmer to spend years developing a learning algorithm for each new dataset. also whether they do actually work better).
Once this Deep Net thing blows over again, my money's on Kernelized Recurrent Deep Self Organizing Maps.
(On a serious note: MNIST is considered a trivial dataset and doesn't require the heavy machinery of deep nets. linear regression on almost any random nonlinearity applied to the data (say f(x;w,t)=cos(w'x+t) with w~N(0,I) and t~U[0,pi2/]) will get you >98% accuracy on MNIST.)