BTW, here are some good online resources for machine learning:
* The Elements of Statistical Learning (free pdf book): http://www-stat.stanford.edu/~tibs/ElemStatLearn/
* Information Theory, Inference, and Learning Algorithms (free pdf book): http://www.inference.phy.cam.ac.uk/mackay/itila/
* Videos from Autumn School 2006: Machine Learning over Text and Images: http://videolectures.net/mlas06_pittsburgh/
* Bonus link. An Empirical Comparison of Supervised Learning Algorithms (pdf paper): http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icm... (Note the top 3 are tree ensembles, then SVM, ANN, KNN. Yes, I know there is no 'best' classifier.)
It does not make sense to compare ensamble methods (bagging & boosting) with single instance classifiers. In practice, you try all classifiers and then you use best to create an ensamble. The paper leaves me unsatisfied, thinking that probably bagging or boosting SVM would give the best results.
I tried to find a paper comparing an ensemble of SVM to an ensemble of trees and I came up empty (after a quick search). I did find papers showing ensembles of SVMs outperforming a single SVM. I also found a comment on a paper claiming an ensemble of trees out outperformed a "Parallel Mixture of SVM" (see here: http://www.mitpressjournals.org/doi/abs/10.1162/089976604323...). Of course, that's not a great source.
I absolutely agree they should have included ensembles other than trees. I don't necessarily agree an ensemble of SVM would have beat an ensemble of trees. It would have been interesting to see.
I guess bagging is a different story. So far as I know bagging usually decreases variance with no bias penalty, so it is more a trade-off between variance and speed.
(The publisher asked the book's editor to stop distributing the whole PDF.)
Here's another one from Stanford: Mining of Massive Datasets http://infolab.stanford.edu/~ullman/mmds.html
Tom Mitchell's definition of machine learning algorithms as those that improve their performance at some task with experience is precisely the way in which humans go about learning what's necessary to perform the same tasks that formerly took thousands or millions of hours.
For highly dimensional problems, such as text classification (i.e., spam detection) or image classification (i.e., facial detection), it's almost impossible to hard code an algorithm to accomplish its goal without using machine learning. It's much easier to use a binary spam/not spam or face/not face labeling system that, given the attributes of the example, can learn which attributes beget that specific label. In other words, it's much easier for a learning system to determine what variables are important in the ultimate classification than trying to model the "true" function that gives rise to the labeling.
Probably also worth speculating on why this is happening NOW. Why is this breaking out of CS departments in 2011 and not 2002?
The datasets are new.
Bandwidth? Storage capacity? Computing power? All of the above?
Edit: I know there seems to be a flash player component as well, but it's failing for me and can't get to the .mp4. Which doesn't speak well of the joker who cobbled the site together either.
http://171.64.93.201/ClassX/system/users/web/pg/view_subject...
Maybe now it's considered as a "distant learning standard"?
Also, requires Silverlight (which I don't fancy installing)
Edit: This is the Tom Mitchell that Andrew Ng refers to early on in the Stanford ML lectures (when defining Machine Learning)
He is the author of one of the must used texts on machine learning: "Machine Learning, Tom Mitchell, McGraw Hill, 1997."
* Feature selection, Overfitting, Bias-Variance tradeoff
Maybe one of the prof Mitchell's students can make the missing slides available online?