Learn the basics of data science with these books (opens in new tab)

(hackernoon.com)

190 pointsbecewumuy9y ago14 comments

14 comments

14 comments · 5 top-level

rcar9y ago· 5 in thread

Would just throw an extra plug for Python for Data Analysis. Though the title might sound a little bland, it's a good, practical summary of how to use pandas for the sorts of data analysis you often have to do in data science work.

ploika9y ago

I'd add the disclaimer that while Python for Data Analysis is a great resource for learning pandas, which itself is invaluable for data science in Python, the book doesn't cover machine learning or statistical inference in any great detail. That's not a criticism, it's just (mostly) beyond the scope of the book.

rcar9y ago

A fair point for sure, which is actually one of the reasons why I do tend to recommend the book.

ML and stats are generally the more flashy and well-known parts of data science, and so I've found that people new to the field often don't have major difficulties finding resources for learning them or finding the self motivation to dive into them. The data cleanup, on the other hand, is often the more important work to be done on projects while simultaneously being seen as the less enjoyable part. Learning how to do it well makes it a more interesting process, and pandas and this book lay a good foundation for that.

clumsysmurf9y ago

2E is in the works http://shop.oreilly.com/product/0636920050896.do

kyleschiller9y ago

Very strong third.

The appendix alone taught me most of what I know about python, and it's a great departure from the mass of online materials that focus on ML without getting into the tools you'll need for cleaning and managing data.

Plus, it's free online: http://www3.canisius.edu/~yany/python/Python4DataAnalysis.pd...

zvikara9y ago

Linked pdf looks like a pirated copy from it-ebooks.info

jeroenjanssens9y ago· 4 in thread

I'm flattered to see Data Science at the Command Line next to these great titles, but I'm not sure if I would recommend it to learn the basics of data science.

DSATCL discusses the ideas behind various cleaning and visualization approaches and several machine algorithms, but only briefly. My personal recommendation would be to first gain some experience with these topics using Python and/or R. If you're afterwards curious to find out how the Unix command line can help to do data science, well, then there's only one book I can think of! ;)

dasboth9y ago

I agree. I'd start with something like Joel Grus's Data Science From Scratch to get a handle on the basics in Python (or whatever the R equivalent is, I'm not familiar with R books).

I do however find myself more and more wishing I knew data science-specific Unix commands, and I think I know what book to get to solve that problem... :)

nthot9y ago

R for Data Science is a good R equivalent by Hadley Wickham. It also acts as a high level overview of the hadley/tidy verse (ggplot2, tidyr, dplyr, etc.). R4DS is free online [1].

[1] http://r4ds.had.co.nz/

samkone9y ago

True. But still I really love your book. Read it, and still reading to speed up many things. Are you planning another version?

jeroenjanssens9y ago

Thanks! I must admit that the thought has crossed my mind. However, these days all my time is spent consulting and giving training so I can't make any promises.

jonathanstrange9y ago

IMHO, data science == applied statistics, but you better know a lot about the underlying mathematics before you come to any conclusions.

rm_dash_rf9y ago

where can i get #2?

2. Business value in the ocean of data — by Fajszi, Cser & Fehér

blahi9y ago

bah.

Statistics in Plain English.

Data Analysis Using Regression by Gelman

Introduction/Elements of Statistical Learning by Jerome Friedman. I recommend reading the Introduction and using the bigger book as a reference material when tackling a problem.

Bayesian Data Analysis, 3rd edition by Gelman.

You need calc 1 & 2 and matrix algebra somewhere along the way.

Lots of papers, googling and doing. That's when you got the basics covered. You start being "operational" after Data Analysis Using Regression.

When you start working on a problem, you need to go through the relevant literature first. Nobody ix expert or even half-good in more than 2 or 3 (small) areas of statistics. Read the literature, take notes and create a plan first.

1 more reply

j / k navigate · click thread line to collapse