Show HN: Machine learning cheat sheet (opens in new tab)

(eferm.com)

166 pointsEmore15y ago13 comments

13 comments

13 comments · 7 top-level

teuobk15y ago· 3 in thread

In case you see the cheat sheet and think, "Wow, I'd love to understand that," there's an excellent (albeit challenging) complete course on machine learning in Stanford's "engineering everywhere" online repository. http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a...

natfriedman15y ago

Another option is "Programming Collective Intelligence," by Toby Segaran. I read through it recently on a long flight to Australia. It's one of the most straight-forward AI books out there, presenting most of these algorithms in just a few pages with nice sample Python code and diagrams. A perfect intro/refresher, and it takes a web developer perspective on these techniques.

Since reading it I've noticed how many friends have it on their bookshelves.

Here's a link: http://oreilly.com/catalog/9780596529321

klochner15y ago

I haven't read the COIN book, but if you want to get aggressive you can go for "Elements of Statistical Learning".

Free pdf download, probably not a one-flight book:

http://www-stat.stanford.edu/~tibs/ElemStatLearn/

side note: Nat, did you intern at SGI in the late 90s, as the self-titled "armchair programmer of the apocalypse"?

silentbicycle15y ago

While it does a great job of explaining many AI concepts in an unintimidating fashion, the Python code in it is rather buggy. On the balance, I'd still recommend it as an intro.

The errata page: http://oreilly.com/catalog/errataunconfirmed.csp?isbn=978059...

iskander15y ago· 1 in thread

All the algorithms requiring training can be optimized using stochastic gradient descent-- which is very effective for large data sets (see http://leon.bottou.org/research/stochastic)

Also, here are some additions for the online learning column:

* Online SVM: http://www.springerlink.com/index/Y8666K76P6R5L467.pdf

* Online gaussian mixture estimation: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87....

One more thing: why no random forests? Or decision tree ensembles of any sort?

EmoreOP15y ago

Thanks for the comments!

The course unfortunately couldn't cover all material on all algorithms, so the cheat sheet basically reflects my own knowledge rather than what's possible. I've referenced the Online SVM and Online Mixture model though, thanks for those.

Also, I'll have to look into stochastic gradient descent!

imurray15y ago· 1 in thread

KNN "no learning involved": one probaby wants to cross-validate K at the least, if not learn the metric.

Some methods say online learning isn't applicable. As pointed out elsewhere, objectives for K-means and mixture models could be fitted with stochastic gradient descent. In general there is always an online option. For example, keep a restricted set of items and chuck out ones that seem less useful as others come in.

(Aside: I have a very introductory lecture to machine learning on the web: http://videolectures.net/bootcamp2010_murray_iml/ — not for anyone that knows the methods on this cheat sheat!)

EmoreOP15y ago

Thanks for the comments!

Good point about using cross-validation to learn K, I forgot about that. I added this to the cheat sheet.

Also regarding online learning methods, I was probably a bit quick to dismiss certain algorithms as not supporting online learning; in coursework we unfortunately didn't have time to delve into all aspects of all algorithms. I've rewritten the Online column as "To be added." for those online methods I'm not familiar with (yet). Someone else is, of course, free to fork it on Github: http://github.com/Emore/mlcheatsheet

cloudkj15y ago· 1 in thread

Nice summary; I like the format as well. However, the title of the cheat sheet is misleading since (a) many of the algorithms listed can be used for non-linear classification and (b) some of them can be considered supervised learning, such as naive Bayes and perceptron since they're trained with sample inputs and expected outputs (supervisory signals).

Otherwise, this is awesome. Hopefully you will add to it, and make it available in web form.

EmoreOP15y ago

Thanks for the feedback!

I've changed the title to "Algorithms for Supervised- and Unsupervised Learning", which is definitely more appropriate. Initially the cheat sheet only contained linear classifiers, hence the misleading title.

ses15y ago

Fantastic work, I have an ML exam coming up and this should really help. If I'm honest its one of the subjects I've struggled with the most. It seems experts in the field while incredibly intelligent, have a hard time breaking the material down into structured and easily digestible pieces of information.

MatthewB15y ago

No idea what i'm looking at but it definitely looks cool.

axxl15y ago

I'm taking this class next semester, downloaded it so hopefully I'll understand it later and it will come in use. Thanks!

j / k navigate · click thread line to collapse

13 comments

13 comments · 7 top-level

teuobk15y ago· 3 in thread

natfriedman15y ago

Since reading it I've noticed how many friends have it on their bookshelves.

Here's a link: http://oreilly.com/catalog/9780596529321

klochner15y ago

I haven't read the COIN book, but if you want to get aggressive you can go for "Elements of Statistical Learning".

Free pdf download, probably not a one-flight book:

http://www-stat.stanford.edu/~tibs/ElemStatLearn/

side note: Nat, did you intern at SGI in the late 90s, as the self-titled "armchair programmer of the apocalypse"?

silentbicycle15y ago

While it does a great job of explaining many AI concepts in an unintimidating fashion, the Python code in it is rather buggy. On the balance, I'd still recommend it as an intro.

The errata page: http://oreilly.com/catalog/errataunconfirmed.csp?isbn=978059...

iskander15y ago· 1 in thread

All the algorithms requiring training can be optimized using stochastic gradient descent-- which is very effective for large data sets (see http://leon.bottou.org/research/stochastic)

Also, here are some additions for the online learning column:

* Online SVM: http://www.springerlink.com/index/Y8666K76P6R5L467.pdf

* Online gaussian mixture estimation: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87....

One more thing: why no random forests? Or decision tree ensembles of any sort?

EmoreOP15y ago

Thanks for the comments!

Also, I'll have to look into stochastic gradient descent!

imurray15y ago· 1 in thread

KNN "no learning involved": one probaby wants to cross-validate K at the least, if not learn the metric.

(Aside: I have a very introductory lecture to machine learning on the web: http://videolectures.net/bootcamp2010_murray_iml/ — not for anyone that knows the methods on this cheat sheat!)

EmoreOP15y ago

Thanks for the comments!

Good point about using cross-validation to learn K, I forgot about that. I added this to the cheat sheet.

cloudkj15y ago· 1 in thread

Otherwise, this is awesome. Hopefully you will add to it, and make it available in web form.

EmoreOP15y ago

Thanks for the feedback!

ses15y ago

MatthewB15y ago

No idea what i'm looking at but it definitely looks cool.

axxl15y ago

I'm taking this class next semester, downloaded it so hopefully I'll understand it later and it will come in use. Thanks!

j / k navigate · click thread line to collapse