Learning About Statistical Learning (opens in new tab)

(measuringmeasures.blogspot.com)

208 pointsliebke16y ago43 comments

43 comments

38 comments · 11 top-level

Tichy16y ago· 6 in thread

Interesting on the one hand, but is anybody seriously going to go through those books one by one now? Personally I have troubles going through just one book (Pattern Recognition by Bishop atm), and even that might be useless without practical application. I managed to eventually read through MacKay (enjoyable book and available as a free PDF, too) and feel I have already forgotten most of it again :-(

Another way might be to just get going, and pick up knowledge on the way?

plinkplonk16y ago

"but is anybody seriously going to go through those books one by one now? Personally I have troubles going through just one book (Pattern Recognition by Bishop atm), and even that might be useless without practical application"

Working through CLRS completely is a very time consuming task I think Bradford intended that book as a reference, but yes, you need to work through some of the stuff in order. For example, you need to be fairly conversant in Linear Algebra, probability, and proof technique before you can tackle Bishop, else you won't make much progress. Once you get some basics under you (especially the underlying math stuff) you'll end up being able to read through an ML book the way you can read through a moderately tough book in programming.

" I managed to eventually read through MacKay (enjoyable book and available as a free PDF, too) and feel I have already forgotten most of it again :-("

The best way to learn this stuff is to have an eventual project in mind. I ended up learning most of this stuff because I was working on a Robotics project for the fine folks in the Indian Defence Depts and was very much "thrown in at the deep end" - nothing like it to accelerate learning but I wouldn't wish to do it again. for the first few weeks I couldn't (literally) understand a single sentence in an hour long meeting. Very humbling.

Depending on what exactly you wish to do, you maybe able to avoid many of the books. If you think I can help you narrow down to a smaller list , please ask here or send me email (my email id is in my profile).

But yes, in the end Norvig's point applies here too (as Bradford points out. I have been working in ML for 8 years now so still 2 years to go :-P) .

OTOH I am just a programmer who got bored with enterprise software and have no formal training in math (or CS for that matter) and if I can do it anyone (certainly anyone on HN) can.

bradfordcross16y ago

+1 I've been doing applied research for a decade. I think we need to double Norvig's 10 year rule for this field ... or maybe it is an infinite sequence.

I took a 3 year detour through pure engineering just because that part is so important and was causing me to experience a bottleneck.

I also agree that driving your studies with a real project helps tremendously.

Ultimately, you need to think about whether you really want to commit to this. It is very hard work, but also very fun and rewarding.

Tichy16y ago

I wish there was a more hands on introduction, somehow. Having read through MacKay, I didn't feel as if I could just approach a company and suggest to do data analysis for them.

I suppose I should come up with my own projects, and I have some ideas, but they always have a huge question mark at the beginning.

1 more reply

brent16y ago

My advice is not to try to read through these books (like Elements or Bishop's book). If I were to learn the topic from scratch again * I would:

1. Learn basic terminology (basically, skim the chapters and understand roughly what the topics are)

2. Work on a problem in depth. You are probably interested in a certain area or type of problem.

a. Read the relevant chapters in detail.

b. Pick up the necessary math along the way using additional references. This way you are motivated to learn it (whether it be calculus, probability, or linear algebra). E.g., it would be hard to approach McDiarmid's Inequality and be able to imagine its use. However, if you run across it in a book/paper you'll understand the context.

c. Lastly, checkout recent NIPS, ICML, and JMLR papers on the topic (nips.cc, jmlr.org, and icml isn't centralized, but each conference can probably be found online).

* - I am a graduate student and have been studying statistical machine learning for the last 3 years.

herdrick16y ago

Great, great point. I think that's the way to learn anything.

bradfordcross16y ago

Norvig's essay applies even more here than it does to programming. http://norvig.com/21-days.html

justokay16y ago· 6 in thread

Mike Jordan at Berkeley sent me his list on what people should learn for ML. The list is definitely on the more rigorous side (ie aimed at more researchers than practitioners), but going through these books (along with the requisite programming experience) is a useful, if not painful, exercise.

I personally think that everyone in machine learning should be (completely) familiar with essentially all of the material in the following intermediate-level statistics book:

1.) Casella, G. and Berger, R.L. (2001). "Statistical Inference" Duxbury Press.

For a slightly more advanced book that's quite clear on mathematical techniques, the following book is quite good:

2.) Ferguson, T. (1996). "A Course in Large Sample Theory" Chapman & Hall/CRC.

You'll need to learn something about asymptotics at some point, and a good starting place is:

3.) Lehmann, E. (2004). "Elements of Large-Sample Theory" Springer.

Those are all frequentist books. You should also read something Bayesian:

4.) Gelman, A. et al. (2003). "Bayesian Data Analysis" Chapman & Hall/CRC.

and you should start to read about Bayesian computation:

5.) Robert, C. and Casella, G. (2005). "Monte Carlo Statistical Methods" Springer.

On the probability front, a good intermediate text is:

6.) Grimmett, G. and Stirzaker, D. (2001). "Probability and Random Processes" Oxford.

At a more advanced level, a very good text is the following:

7.) Pollard, D. (2001). "A User's Guide to Measure Theoretic Probability" Cambridge.

The standard advanced textbook is Durrett, R. (2005). "Probability: Theory and Examples" Duxbury.

Machine learning research also reposes on optimization theory. A good starting book on linear optimization that will prepare you for convex optimization:

8.) Bertsimas, D. and Tsitsiklis, J. (1997). "Introduction to Linear Optimization" Athena.

And then you can graduate to:

9.) Boyd, S. and Vandenberghe, L. (2004). "Convex Optimization" Cambridge.

Getting a full understanding of algorithmic linear algebra is also important. At some point you should feel familiar with most of the material in

10.) Golub, G., and Van Loan, C. (1996). "Matrix Computations" Johns Hopkins.

It's good to know some information theory. The classic is:

11.) Cover, T. and Thomas, J. "Elements of Information Theory" Wiley.

Finally, if you want to start to learn some more abstract math, you might want to start to learn some functional analysis (if you haven't already). Functional analysis is essentially linear algebra in infinite dimensions, and it's necessary for kernel methods, for nonparametric Bayesian methods, and for various other topics. Here's a book that I find very readable:

12.) Kreyszig, E. (1989). "Introductory Functional Analysis with Applications" Wiley.

ptuzla16y ago

Can one fit this study list in a life time? Seriously, this has been a problem for me for a long time. Any one of the mentioned books would take me months to study. It'd take me a month to just read a textbook, without any toil. Am I too slow, or are there some study/reading techniques that I'm not aware of?

vdm16y ago

Just so you know, you're not the only one who feels this way. There's no way you could carry that load AND a day job/family.

Maybe tools like incanter will bridge the gap and let application software practitioners put this research to work.

fgimenez16y ago

Keep in mind that Mike Jordan is a superhuman math machine. I remember his undergraduate research assistants at Cal were telling me that it would take grad students days to understand 5 minute proofs he would do on the fly.

global-variable16y ago

It's almost like he's the Michael Jordan of math.

physcab16y ago

Woot woot for Casella! (UF prof)

My go-to book for Machine Learning is Christopher Bishops Pattern Recognition and Machine Learning. I've read that book cover-to-cover and its got an excellent foundation and covers all those other books in some capacity.

bradfordcross16y ago

Awesome, I was hoping to hear from some academics who know far more than I do...this an elaborate scheme to fill my wishlist pipeline...muahahahhaha

FraaJad16y ago· 5 in thread

Definitely a very good list. But also an expensive list!

HNers: any suggestions on where to find these books for cheap. (outside university libraries)

mahmud16y ago

Local 4-year universities typically allow the public to access their libraries for a fee; around $100/year.

silentbicycle16y ago

University alumni can often get an alumni card for much less. I can access several CS academic journals online with mine.

llimllib16y ago

abebooks.com is usually the cheapest place to buy textbooks, you can often get the softcover international editions.

Shamiq16y ago

Isn't the buying and selling of those international versions questionably legal in the United States? (Assuming OP is from the US).

2 more replies

yaroslavvb16y ago

I found scanned pdf versions of half of the books in the list after about 5 minutes of searching

kaddar16y ago· 4 in thread

This is a good list.

That said,

Why second edition intro to algs? Why not third? also, considering that intro to algs is one of the "Books Programmers Claim to Have Read" http://www.billthelizard.com/2008/12/books-programmers-dont-... , so those planning to read it, note that it is best learned in a classroom setting where you are forced to work through the problems.

Finally, interesting that he does amazon reference links to all these books, hopefully profit opportunities didn't taint the items on his list!

bradfordcross16y ago

Someone pointed out the 3rd ed was out in the blog comments. I hadn't noticed, so thanks.

I agree that the CLR book is best learned by working through problems. For those that may not have retainend as much as they would have liked from classroom work, I think needing to use algorithms and programming them yourself also works very well. In that vein, CLR is also a great reference text.

I don't make any money from Amazon. Believe me, my friend, it is purely the other way around. ;-)

kaddar16y ago

Ok awesome, just wanted to check :)

briansmith16y ago

The 2nd edition can be had for less than $30 used. The cheapest you can get the 3rd edition for is ~$60. For $30 you can pick up a linear algebra textbook used. If somebody had only $60 to spend, it's a no-brainer to pick the 2nd edition + a linear algebra textbook.

vdm16y ago

That's a good point. One really needs to balance the cheaper price against whether the diff with the latest edition is worth it.

hamilton16y ago· 3 in thread

Thanks to liebke for posting this, and Bradford for writing it.

I have two points.

First, your point about programming is incredibly important. I've worked with people who had amazing insights about statistical problems, but went cross-eyed upon being asked about SVN and Git. This makes a CS homework assignment unpleasant, and a real world research project impossible.

Second, this really begs another post. It should be called "Learning how to read a textbook on your own." Successful self-learners don't just __read__ a textbook. They toil with it, try proving things on paper themselves, work through exercises, attempt to apply it to some real-world situation, and hunt down someone who's smarter than they are to explain something that seems unclear.

Not all textbooks - certainly not every one on your list - can be read with a great application in mind. A reader must interrogate mercilessly the book on analysis or the rigorous probability book mentioned in the post.

This seems intuitive to someone who can successfully learn on their own, but most people are not taught to do that. The difficulties of relicating a portion of the classroom learning experience is a major barrier to entry. This is why online intro lectures for programming, math, and certain CS topics like algorithms can steer a learner in the right direction. Stanford Engineering Everywhere and of course MIT's OCW, links to which have been posted on HN at least once a month, are great starts.

plinkplonk16y ago

" It should be called "Learning how to read a textbook on your own." Successful self-learners don't just __read__ a textbook. They toil with it, try proving things on paper themselves, work through exercises, attempt to apply it to some real-world situation, and hunt down someone who's smarter than they are to explain something that seems unclear."

+n.

I wasted 3 years trying to avoid this bit. On the positive side, once you learn to do this you will never be afraid of any book or paper again.

imp16y ago

For anyone doing the self-learning thing, right now there's 24 people that just started learning Stanford CS229 Machine Learning here: http://www.crunchcourse.com/class/stanford-cs229-machine-lea... (disclosure: Crunch Course is my website. I just thought it might be a good resource for people taking hamilton's advice.)

vdm16y ago

I didn't know about the idea of 'social learning'. I appreciate the link (and your full disclosure).

1 more reply

tel16y ago· 1 in thread

I like his language toolbelt for this kind of work.

At a minimum, I would recommend learning python (numpy/scipy), R, and at least one nice functional language (probably Haskell, Clojure, or OCaml).

This is effectively what I use as well. Python as a general purpose data munging library that's good for all of your dirty work whenever you need it. R for graphing, graphing, graphing, running statistical tests other people already wrote and foolproofed, database munging, and then more graphing. Haskell for prototyping and reasoning with types and then that occasional algorithm that screams for functional implementation or the not so occasional one that requires more speed than Python can provide.

I also write a few things in C/C++, though I try to avoid it. It's mostly there for standing on the backs of other people and that occasional need to blaze.

billswift16y ago

There is a website http://software-carpentry.org/ intended to teach scientists the basics of reliable programming. It's python based and covers version control, debugging, shells, testing, and the basics of databases and other stuff that would be useful for scientists without computer training. Since a lot of practical stuff like this isn't covered in any systematic way in most comp sci programs, it is useful even for people who have programming experience.

Perceval16y ago· 1 in thread

I just read yesterday a book on Non-parametric Statistics: http://www.amazon.com/gp/product/047045461X/ref=oss_T15_prod...

I found this useful and interesting, because a great deal of social phenomena are not normally distributed (do not fit a Gaussian bell-curve distribution, regardless of the size of the sample).

I am interested in learning more about non-parametric statistics, and statistics using alternative distributions (e.g. stable distributions, power laws, etc.). Does anyone have good references for this kind of statistics (preferably written in English as opposed to jargon)?

bradfordcross16y ago

http://www.amazon.com/All-Nonparametric-Statistics-Springer-...

plinkplonk16y ago· 1 in thread

I wrote a supplementary blog post to Brad's ( http://pindancing.blogspot.com/2010/01/learning-about-machin...) with the list of books I found useful (no amazon referral links if anyone is worried) with brief descriptions of each. I work in somewhat different domains and so have a different list of books.

I hope someone finds this useful.

For those who want a summary,

Proof Technique

(a)Velleman's "How to Prove It" (b)Gries and Schneider's "A Logical Approach to Discrete Math"

Math

(c) Calculus (best "lite" book - Calculus by Strang (free download), best "heavy" books - (d) Calculus by Spivak, (e) Principles of Mathematical Analysis a.k.a "Baby Rudin")

(f) Discrete Math (ALADM above + (g) a good book on Algorithms, Cormen will do - though working through it comprehensively is ... hard!

(h) Linear Algebra (First work through Strang's book, then (i) Axler's)

(j) Probability (see Bradford's very comprehensive recommendations) and

(k) Statistics (I would reccomend Devore and Peck for the total beginner but it is a damn expensive book. So hit a library or get a bootlegged copy to see if it suits you before buying a copy, see brad's list for advanced stuff.)

(l) Information Theory (MacKay's book is freely available online)

Basic AI

(m)AIMA 3d Edition (I prefer this to Mitchell)

Machine Learning

(n) "Pattern Recognition and Machine Learning" by Christopher Bishop,

(o)"Elements of Statistical Learning" (free download).

(p) Neural Network Design by Hagan Demuth and Kneale,

(q) Neural Networks, A Comprehensive Foundation (2nd edition) - By Haykin (there is a newer edition out but I don't know anything about that, this is the one I used)

(r) Neural Networks for Pattern Recognition ( Bishop).

At this point you are in good shape to read any papers in NN. My reccomendations - anything by Yann LeCun and Geoffrey Hinton. Both do amazing research.

Reinforcement Learning

(s) Reinforcement Learning - An Introduction by Barto and Sutton (follow up with "Recent Advances In reinforcement Learning" (PDF) which is an old paper but a GREAT introduction to Hierarchical Reinforcement learning)

(t) Neuro Dynamic Programming by Bertsekas

Computer Vision

(u) Introductory Techniques for 3-D Computer Vision, by Emanuele Trucco and Alessandro Verri.

(v) An Invitation to 3-D Vision by Y. Ma, S. Soatto, J. Kosecka, S.S. Sastry. (warning TOUGH!!)

Robotics.

(w) Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine Learning) - not about robotics per se but useful to understand the next book

(x) Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) by Thrun, Burgard and Fox

PS: I own all these books (except AIMA 3 for which I only have pre pub pdfs) and if any HN folks in Bangalore want to browse before they buy anything (friggin expensive when you add amazon's postage) send me email.

PPS: on languages, I think Bradford is on the money with regards to reccomending functional languages. I would just say, also know C well. Saved my ass a few times.

bradfordcross16y ago

great stuff. this and other feedback is helping to shape the revision to my post only a couple hours after I made it. Second edition iterations are so much faster on the web. :-)

pskomoroch16y ago

Nice lists, I often recommend these for people who want an introduction to the field:

"Mathematical Statistics and Data Analysis" by John A. Rice

"All of Statistics: A Concise Course in Statistics" by Larry Wasserman

"Pattern Recognition and Machine Learning" by Christopher M. Bishop

"The Elements of Statistical Learning" by T. Hastie et al http://www-stat.stanford.edu/~tibs/ElemStatLearn/

"Information Theory, Inference, and Learning Algorithms", David McKay http://www.inference.phy.cam.ac.uk/itprnn/book.html

"Introduction to Information Retrieval" - Manning et al. http://nlp.stanford.edu/IR-book/information-retrieval-book.h...

"The Algorithm Design Manual, 2nd Edition" - Steven Skiena http://www.algorist.com/

jdlong16y ago

Great post full of references, and more importantly, brief explanations of why each ref is useful.

melipone16y ago

I would argue that learning can be reduced to statistical inference.

j / k navigate · click thread line to collapse