Another way might be to just get going, and pick up knowledge on the way?
Working through CLRS completely is a very time consuming task I think Bradford intended that book as a reference, but yes, you need to work through some of the stuff in order. For example, you need to be fairly conversant in Linear Algebra, probability, and proof technique before you can tackle Bishop, else you won't make much progress. Once you get some basics under you (especially the underlying math stuff) you'll end up being able to read through an ML book the way you can read through a moderately tough book in programming.
" I managed to eventually read through MacKay (enjoyable book and available as a free PDF, too) and feel I have already forgotten most of it again :-("
The best way to learn this stuff is to have an eventual project in mind. I ended up learning most of this stuff because I was working on a Robotics project for the fine folks in the Indian Defence Depts and was very much "thrown in at the deep end" - nothing like it to accelerate learning but I wouldn't wish to do it again. for the first few weeks I couldn't (literally) understand a single sentence in an hour long meeting. Very humbling.
Depending on what exactly you wish to do, you maybe able to avoid many of the books. If you think I can help you narrow down to a smaller list , please ask here or send me email (my email id is in my profile).
But yes, in the end Norvig's point applies here too (as Bradford points out. I have been working in ML for 8 years now so still 2 years to go :-P) .
OTOH I am just a programmer who got bored with enterprise software and have no formal training in math (or CS for that matter) and if I can do it anyone (certainly anyone on HN) can.
I took a 3 year detour through pure engineering just because that part is so important and was causing me to experience a bottleneck.
I also agree that driving your studies with a real project helps tremendously.
Ultimately, you need to think about whether you really want to commit to this. It is very hard work, but also very fun and rewarding.
I suppose I should come up with my own projects, and I have some ideas, but they always have a huge question mark at the beginning.
1. Learn basic terminology (basically, skim the chapters and understand roughly what the topics are)
2. Work on a problem in depth. You are probably interested in a certain area or type of problem.
a. Read the relevant chapters in detail.
b. Pick up the necessary math along the way using additional references. This way you are motivated to learn it (whether it be calculus, probability, or linear algebra). E.g., it would be hard to approach McDiarmid's Inequality and be able to imagine its use. However, if you run across it in a book/paper you'll understand the context.
c. Lastly, checkout recent NIPS, ICML, and JMLR papers on the topic (nips.cc, jmlr.org, and icml isn't centralized, but each conference can probably be found online).
* - I am a graduate student and have been studying statistical machine learning for the last 3 years.
I personally think that everyone in machine learning should be (completely) familiar with essentially all of the material in the following intermediate-level statistics book:
1.) Casella, G. and Berger, R.L. (2001). "Statistical Inference" Duxbury Press.
For a slightly more advanced book that's quite clear on mathematical techniques, the following book is quite good:
2.) Ferguson, T. (1996). "A Course in Large Sample Theory" Chapman & Hall/CRC.
You'll need to learn something about asymptotics at some point, and a good starting place is:
3.) Lehmann, E. (2004). "Elements of Large-Sample Theory" Springer.
Those are all frequentist books. You should also read something Bayesian:
4.) Gelman, A. et al. (2003). "Bayesian Data Analysis" Chapman & Hall/CRC.
and you should start to read about Bayesian computation:
5.) Robert, C. and Casella, G. (2005). "Monte Carlo Statistical Methods" Springer.
On the probability front, a good intermediate text is:
6.) Grimmett, G. and Stirzaker, D. (2001). "Probability and Random Processes" Oxford.
At a more advanced level, a very good text is the following:
7.) Pollard, D. (2001). "A User's Guide to Measure Theoretic Probability" Cambridge.
The standard advanced textbook is Durrett, R. (2005). "Probability: Theory and Examples" Duxbury.
Machine learning research also reposes on optimization theory. A good starting book on linear optimization that will prepare you for convex optimization:
8.) Bertsimas, D. and Tsitsiklis, J. (1997). "Introduction to Linear Optimization" Athena.
And then you can graduate to:
9.) Boyd, S. and Vandenberghe, L. (2004). "Convex Optimization" Cambridge.
Getting a full understanding of algorithmic linear algebra is also important. At some point you should feel familiar with most of the material in
10.) Golub, G., and Van Loan, C. (1996). "Matrix Computations" Johns Hopkins.
It's good to know some information theory. The classic is:
11.) Cover, T. and Thomas, J. "Elements of Information Theory" Wiley.
Finally, if you want to start to learn some more abstract math, you might want to start to learn some functional analysis (if you haven't already). Functional analysis is essentially linear algebra in infinite dimensions, and it's necessary for kernel methods, for nonparametric Bayesian methods, and for various other topics. Here's a book that I find very readable:
12.) Kreyszig, E. (1989). "Introductory Functional Analysis with Applications" Wiley.
Maybe tools like incanter will bridge the gap and let application software practitioners put this research to work.
My go-to book for Machine Learning is Christopher Bishops Pattern Recognition and Machine Learning. I've read that book cover-to-cover and its got an excellent foundation and covers all those other books in some capacity.
HNers: any suggestions on where to find these books for cheap. (outside university libraries)
That said,
Why second edition intro to algs? Why not third? also, considering that intro to algs is one of the "Books Programmers Claim to Have Read" http://www.billthelizard.com/2008/12/books-programmers-dont-... , so those planning to read it, note that it is best learned in a classroom setting where you are forced to work through the problems.
Finally, interesting that he does amazon reference links to all these books, hopefully profit opportunities didn't taint the items on his list!
I agree that the CLR book is best learned by working through problems. For those that may not have retainend as much as they would have liked from classroom work, I think needing to use algorithms and programming them yourself also works very well. In that vein, CLR is also a great reference text.
I don't make any money from Amazon. Believe me, my friend, it is purely the other way around. ;-)
I have two points.
First, your point about programming is incredibly important. I've worked with people who had amazing insights about statistical problems, but went cross-eyed upon being asked about SVN and Git. This makes a CS homework assignment unpleasant, and a real world research project impossible.
Second, this really begs another post. It should be called "Learning how to read a textbook on your own." Successful self-learners don't just __read__ a textbook. They toil with it, try proving things on paper themselves, work through exercises, attempt to apply it to some real-world situation, and hunt down someone who's smarter than they are to explain something that seems unclear.
Not all textbooks - certainly not every one on your list - can be read with a great application in mind. A reader must interrogate mercilessly the book on analysis or the rigorous probability book mentioned in the post.
This seems intuitive to someone who can successfully learn on their own, but most people are not taught to do that. The difficulties of relicating a portion of the classroom learning experience is a major barrier to entry. This is why online intro lectures for programming, math, and certain CS topics like algorithms can steer a learner in the right direction. Stanford Engineering Everywhere and of course MIT's OCW, links to which have been posted on HN at least once a month, are great starts.
+n.
I wasted 3 years trying to avoid this bit. On the positive side, once you learn to do this you will never be afraid of any book or paper again.
At a minimum, I would recommend learning python (numpy/scipy), R, and at least one nice functional language (probably Haskell, Clojure, or OCaml).
This is effectively what I use as well. Python as a general purpose data munging library that's good for all of your dirty work whenever you need it. R for graphing, graphing, graphing, running statistical tests other people already wrote and foolproofed, database munging, and then more graphing. Haskell for prototyping and reasoning with types and then that occasional algorithm that screams for functional implementation or the not so occasional one that requires more speed than Python can provide.
I also write a few things in C/C++, though I try to avoid it. It's mostly there for standing on the backs of other people and that occasional need to blaze.
I found this useful and interesting, because a great deal of social phenomena are not normally distributed (do not fit a Gaussian bell-curve distribution, regardless of the size of the sample).
I am interested in learning more about non-parametric statistics, and statistics using alternative distributions (e.g. stable distributions, power laws, etc.). Does anyone have good references for this kind of statistics (preferably written in English as opposed to jargon)?
I hope someone finds this useful.
For those who want a summary,
Proof Technique
(a)Velleman's "How to Prove It" (b)Gries and Schneider's "A Logical Approach to Discrete Math"
Math
(c) Calculus (best "lite" book - Calculus by Strang (free download), best "heavy" books - (d) Calculus by Spivak, (e) Principles of Mathematical Analysis a.k.a "Baby Rudin")
(f) Discrete Math (ALADM above + (g) a good book on Algorithms, Cormen will do - though working through it comprehensively is ... hard!
(h) Linear Algebra (First work through Strang's book, then (i) Axler's)
(j) Probability (see Bradford's very comprehensive recommendations) and
(k) Statistics (I would reccomend Devore and Peck for the total beginner but it is a damn expensive book. So hit a library or get a bootlegged copy to see if it suits you before buying a copy, see brad's list for advanced stuff.)
(l) Information Theory (MacKay's book is freely available online)
Basic AI
(m)AIMA 3d Edition (I prefer this to Mitchell)
Machine Learning
(n) "Pattern Recognition and Machine Learning" by Christopher Bishop,
(o)"Elements of Statistical Learning" (free download).
(p) Neural Network Design by Hagan Demuth and Kneale,
(q) Neural Networks, A Comprehensive Foundation (2nd edition) - By Haykin (there is a newer edition out but I don't know anything about that, this is the one I used)
(r) Neural Networks for Pattern Recognition ( Bishop).
At this point you are in good shape to read any papers in NN. My reccomendations - anything by Yann LeCun and Geoffrey Hinton. Both do amazing research.
Reinforcement Learning
(s) Reinforcement Learning - An Introduction by Barto and Sutton (follow up with "Recent Advances In reinforcement Learning" (PDF) which is an old paper but a GREAT introduction to Hierarchical Reinforcement learning)
(t) Neuro Dynamic Programming by Bertsekas
Computer Vision
(u) Introductory Techniques for 3-D Computer Vision, by Emanuele Trucco and Alessandro Verri.
(v) An Invitation to 3-D Vision by Y. Ma, S. Soatto, J. Kosecka, S.S. Sastry. (warning TOUGH!!)
Robotics.
(w) Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine Learning) - not about robotics per se but useful to understand the next book
(x) Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) by Thrun, Burgard and Fox
PS: I own all these books (except AIMA 3 for which I only have pre pub pdfs) and if any HN folks in Bangalore want to browse before they buy anything (friggin expensive when you add amazon's postage) send me email.
PPS: on languages, I think Bradford is on the money with regards to reccomending functional languages. I would just say, also know C well. Saved my ass a few times.
"Mathematical Statistics and Data Analysis" by John A. Rice
"All of Statistics: A Concise Course in Statistics" by Larry Wasserman
"Pattern Recognition and Machine Learning" by Christopher M. Bishop
"The Elements of Statistical Learning" by T. Hastie et al http://www-stat.stanford.edu/~tibs/ElemStatLearn/
"Information Theory, Inference, and Learning Algorithms", David McKay http://www.inference.phy.cam.ac.uk/itprnn/book.html
"Introduction to Information Retrieval" - Manning et al. http://nlp.stanford.edu/IR-book/information-retrieval-book.h...
"The Algorithm Design Manual, 2nd Edition" - Steven Skiena http://www.algorist.com/