Online graduate-level machine learning course from CMU's Tom Mitchell (opens in new tab)

(cs.cmu.edu)

273 pointsilcavero14y ago31 comments

31 comments

24 comments · 8 top-level

monk_the_dog14y ago· 8 in thread

I'm enrolled in the online Applied ML class from Stanford, and I've also been watching this course from CMU (I'm up to the Graphical Model 4 lecture - almost the midterm). If you've taken at least one stats class you'll get much more out of CMU's class.

BTW, here are some good online resources for machine learning:

* The Elements of Statistical Learning (free pdf book): http://www-stat.stanford.edu/~tibs/ElemStatLearn/

* Information Theory, Inference, and Learning Algorithms (free pdf book): http://www.inference.phy.cam.ac.uk/mackay/itila/

* Videos from Autumn School 2006: Machine Learning over Text and Images: http://videolectures.net/mlas06_pittsburgh/

* Bonus link. An Empirical Comparison of Supervised Learning Algorithms (pdf paper): http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icm... (Note the top 3 are tree ensembles, then SVM, ANN, KNN. Yes, I know there is no 'best' classifier.)

zeratul14y ago

About the bonus link:

It does not make sense to compare ensamble methods (bagging & boosting) with single instance classifiers. In practice, you try all classifiers and then you use best to create an ensamble. The paper leaves me unsatisfied, thinking that probably bagging or boosting SVM would give the best results.

monk_the_dog14y ago

You make a good point. Ensemble methods seem to outperform single classifiers. There's no reason you can't have an ensemble of SVMs. The paper should have included something other than an ensemble of trees.

I tried to find a paper comparing an ensemble of SVM to an ensemble of trees and I came up empty (after a quick search). I did find papers showing ensembles of SVMs outperforming a single SVM. I also found a comment on a paper claiming an ensemble of trees out outperformed a "Parallel Mixture of SVM" (see here: http://www.mitpressjournals.org/doi/abs/10.1162/089976604323...). Of course, that's not a great source.

I absolutely agree they should have included ensembles other than trees. I don't necessarily agree an ensemble of SVM would have beat an ensemble of trees. It would have been interesting to see.

1 more reply

lliiffee14y ago

I don't see why not. Different classifiers have different bias/variance characteristics. If you want to increase variance and decrease bias, than boost your classifier. (This is why boosting is usually applied to simple classifiers.) But whether that will actually help depends on the characteristics of the problem and the classifier used.

I guess bagging is a different story. So far as I know bagging usually decreases variance with no bias penalty, so it is more a trade-off between variance and speed.

law14y ago

It's actually fine to compare an ensemble method (using weak base learners) to a single instance strong learner. In this way, you compare the benefits of combining the weak learners with the benefits of using a single classifier. I see where you're going with that, but comparing ensemble methods with a single classifier is often a useful measurement.

1 more reply

bhickey14y ago

To your list, I'd like to add Jaynes's 'Probability Theory' A few chapters are freely available here: www-stat.wharton.upenn.edu/~steele/Publications/PDF/PT.pdf

(The publisher asked the book's editor to stop distributing the whole PDF.)

shriphani14y ago

I made the mistake of enrolling in a graduate level ML class without a strong foundation in statistics - my transcript is now going to be defaced permanently. But thanks for the inference text - is there an OCW version of an inference course?

danso14y ago

I love it when people link to freely available academic texts, thank you.

Here's another one from Stanford: Mining of Massive Datasets http://infolab.stanford.edu/~ullman/mmds.html

monk_the_dog14y ago

I just took a quick look on the chapter on clustering. Looks good! I'll put it on the ever growing stack. Thanks!

Maven91114y ago· 4 in thread

I hope this question doesnt come off as too new naive but due to the amount of links on the front page about ML - what is so fascinating about ML?? Why is there not the same level of interest/links on topics such as cryptology, graphics, circuits, comp architecture ?

law14y ago

There's this enormous focus on 'web scale' technologies. This focus necessarily invokes visualizing and making sense of terabytes and eventually even petabytes of data; conventional approaches would take thousands or millions of man hours to accomplish the same level of analysis that computers can perform in hours or days.

Tom Mitchell's definition of machine learning algorithms as those that improve their performance at some task with experience is precisely the way in which humans go about learning what's necessary to perform the same tasks that formerly took thousands or millions of hours.

For highly dimensional problems, such as text classification (i.e., spam detection) or image classification (i.e., facial detection), it's almost impossible to hard code an algorithm to accomplish its goal without using machine learning. It's much easier to use a binary spam/not spam or face/not face labeling system that, given the attributes of the example, can learn which attributes beget that specific label. In other words, it's much easier for a learning system to determine what variables are important in the ultimate classification than trying to model the "true" function that gives rise to the labeling.

tapertaper14y ago

Great comment.

Probably also worth speculating on why this is happening NOW. Why is this breaking out of CS departments in 2011 and not 2002?

The datasets are new.

Bandwidth? Storage capacity? Computing power? All of the above?

1 more reply

zeratul14y ago

There was a comment here from SandB0x saying that ML has great potential for startups. ML algorithms have many practical applications and avenues that are commercializable.

gms14y ago

Because web companies are starting to use ML nowadays, and most of HN's users are associated with the web-company crowd.

drats14y ago· 2 in thread

Silverlight? Are these people serious? Whether you are an educational institution or a for-profit media company, you are trying to get to the largest number of people and cause them the fewest problems. Silverlight fails spectacularly at both those objectives.

Edit: I know there seems to be a flash player component as well, but it's failing for me and can't get to the .mp4. Which doesn't speak well of the joker who cobbled the site together either.

SkyMarshal14y ago

Especially when the target audience for such a class is probably likely to have an outsized portion of *nix users.

zeratul14y ago

Stanford also uses Silverlight and Flash:

http://171.64.93.201/ClassX/system/users/web/pg/view_subject...

Maybe now it's considered as a "distant learning standard"?

amirmc14y ago· 1 in thread

"To view a video you will have to login with your CMU Andrew username and password, ..."

Also, requires Silverlight (which I don't fancy installing)

Edit: This is the Tom Mitchell that Andrew Ng refers to early on in the Stanford ML lectures (when defining Machine Learning)

ya3r14y ago

You don't have to login to watch videos.

He is the author of one of the must used texts on machine learning: "Machine Learning, Tom Mitchell, McGraw Hill, 1997."

zeratul14y ago· 1 in thread

Three most important issues in ML are missing for this course:

* Feature selection, Overfitting, Bias-Variance tradeoff

Maybe one of the prof Mitchell's students can make the missing slides available online?

law14y ago

If I'm not mistaken, that was just a recitation that replaced the regular Thursday class. It was one of the TAs covering that stuff briefly. All three topics were covered by Tom Mitchell in previous classes.

kky14y ago

I love that open source mentality (sharing and collaborating for the love of the work, community, and result) is reaching higher ed. I can't wait for it to reach lower ed! If kids start seeing this model at a young age...

ya3r14y ago

As Tom Mitchell says on the first video, this course is recommended for Phd students.

igrekel14y ago

Cool. I'm disappointed that there isn't a video for hidden markov models and other models for time series tough, just slides. The schedule says that session is in march, maybe by then there will be a video online.

j / k navigate · click thread line to collapse

31 comments

24 comments · 8 top-level

monk_the_dog14y ago· 8 in thread

BTW, here are some good online resources for machine learning:

* The Elements of Statistical Learning (free pdf book): http://www-stat.stanford.edu/~tibs/ElemStatLearn/

* Information Theory, Inference, and Learning Algorithms (free pdf book): http://www.inference.phy.cam.ac.uk/mackay/itila/

* Videos from Autumn School 2006: Machine Learning over Text and Images: http://videolectures.net/mlas06_pittsburgh/

zeratul14y ago

About the bonus link:

monk_the_dog14y ago

I absolutely agree they should have included ensembles other than trees. I don't necessarily agree an ensemble of SVM would have beat an ensemble of trees. It would have been interesting to see.

1 more reply

lliiffee14y ago

I guess bagging is a different story. So far as I know bagging usually decreases variance with no bias penalty, so it is more a trade-off between variance and speed.

law14y ago

1 more reply

bhickey14y ago

To your list, I'd like to add Jaynes's 'Probability Theory' A few chapters are freely available here: www-stat.wharton.upenn.edu/~steele/Publications/PDF/PT.pdf

(The publisher asked the book's editor to stop distributing the whole PDF.)

shriphani14y ago

danso14y ago

I love it when people link to freely available academic texts, thank you.

Here's another one from Stanford: Mining of Massive Datasets http://infolab.stanford.edu/~ullman/mmds.html

monk_the_dog14y ago

I just took a quick look on the chapter on clustering. Looks good! I'll put it on the ever growing stack. Thanks!

Maven91114y ago· 4 in thread

law14y ago

tapertaper14y ago

Great comment.

Probably also worth speculating on why this is happening NOW. Why is this breaking out of CS departments in 2011 and not 2002?

The datasets are new.

Bandwidth? Storage capacity? Computing power? All of the above?

1 more reply

zeratul14y ago

There was a comment here from SandB0x saying that ML has great potential for startups. ML algorithms have many practical applications and avenues that are commercializable.

gms14y ago

Because web companies are starting to use ML nowadays, and most of HN's users are associated with the web-company crowd.

drats14y ago· 2 in thread

Edit: I know there seems to be a flash player component as well, but it's failing for me and can't get to the .mp4. Which doesn't speak well of the joker who cobbled the site together either.

SkyMarshal14y ago

Especially when the target audience for such a class is probably likely to have an outsized portion of *nix users.

zeratul14y ago

Stanford also uses Silverlight and Flash:

http://171.64.93.201/ClassX/system/users/web/pg/view_subject...

Maybe now it's considered as a "distant learning standard"?

amirmc14y ago· 1 in thread

"To view a video you will have to login with your CMU Andrew username and password, ..."

Also, requires Silverlight (which I don't fancy installing)

Edit: This is the Tom Mitchell that Andrew Ng refers to early on in the Stanford ML lectures (when defining Machine Learning)

ya3r14y ago

You don't have to login to watch videos.

He is the author of one of the must used texts on machine learning: "Machine Learning, Tom Mitchell, McGraw Hill, 1997."

zeratul14y ago· 1 in thread

Three most important issues in ML are missing for this course:

* Feature selection, Overfitting, Bias-Variance tradeoff

Maybe one of the prof Mitchell's students can make the missing slides available online?

law14y ago

kky14y ago

ya3r14y ago

As Tom Mitchell says on the first video, this course is recommended for Phd students.

igrekel14y ago

j / k navigate · click thread line to collapse