This is our second Learning Path, on Data Analysis, built by the awesome Claudia Gold (MIT alum, self-taught data scientist, early at Airbnb). The aim is to list helpful resources in a sequence that a beginner can follow.
Once again, we realize this is a curriculum, not the best curriculum. We'd love your feedback on what we should change or add.
------
Edit: Since we have your attention, here are some other ways in which you can help us:
1. Tell us which new Learning Paths you‘d like to us build.
2. Collaborate with us to build a Learning Path on a subject where you're an expert.
3. Request features that will help you take better advantage of Learning Paths.
We’re at founders@mysliderule.com
Anyway I love the website and learning tracks.
The thing with online courses / video series is that they can give you a great introduction to the topic, but it is often difficult to know where to go next for a deeper understanding!
Also useful for filling in gaps of prerequisite knowledge e.g. "before diving into this stats book, make sure you understand the concepts from this basic algebra book!"
btw, Found a bug, might help you. 500 in case if you try loggign in with a email that doesnt exist.
Something that explain how implement small bits (like: This is how build a assigment, how build a function, how check types, etc)
I really struggle to find good information at improving in this field
Some way of keeping track of what I've done. A simple checkbox next to each topic/material would do.
The closest thing we have is a beginners' web development path (not the same thing at all, I realize, but sharing if helpful) http://www.mysliderule.com/courses/learning-paths/web-develo...
It appears that no comprehensive treatment of applied data science exists. For the past few months, I've been searching high-and-low. I understand collaborative filtering; I've heard about the Netflix recommendation challenge ad nauseam; I grasp machine learning, bayesian statistics (prior, posterior, conjugate prior distributions, etc.) on a superficial level. Conversationally, I can hold my own with practitioners', albeit on a beginner level.
But what I, and others, want to learn is how to apply these techniques in a scalable way on a real production system. Right now, it's easy to conjecture about what could/should be done, but there's a lack of confidence in how to achieve the goals. I'm experimenting with a collaborative filtering problem using Cassandra as the data store for thumbs up/down ratings on products, and Hadoop for the MR pipeline; it'd be great to have more visible examples available. Is there any place I could find detailed information on real, online machine learning/statistical inference systems?
I agree with the suggestion that you should attend meetups and tech talks (or watch them online if there are none in your area). You'll hear more about real life examples and have a chance to ask questions.
The other main way to learn what you're asking is to get a job doing it! You have more than enough background (assuming you also have knowledge of tools) and you will learn more from others and as you need the information.
With so many "free" learning resources online, we end up "paying" through the mental churn and frustration of trying to separate the wheat from the chaff. This is a great step in truly making free resources more accessible and meaningful.
One question: How important are the credentials of the "expert" to you?
[0]: https://www.coursera.org/specialization/jhudatascience/1?utm...
1. Coursera focuses solely on R for Data Science. SlideRule covers additional tools (e.g. Python, SQL) which a practicing data analyst will find handy. It seems there's a bit of an R vs Python debate in the data world, so we think it's useful for people to know both.
2. SlideRule's path has an (optional) "intro to programming" section for beginners. Coursera assumes some prior programming experience.
3. Most of the courses in the SlideRule path are "self-paced", so in theory someone studying this full-time could cover it in 4-6 weeks. Coursera has fixed start and end dates, so the fastest one could complete the track (accounting for interdependencies of courses) is ~24 weeks.
I didn't see any linear algebra anywhere here, and from my (probably naïve) understanding of data science, it seems to be core to a lot of the main ideas. Do you know of any good resources in this same vein as the rest of the track? I've been watching Coursera and EDX and it seems linear algebra offerings are somewhat sporadic.
There is more discussion on this topic here: http://www.quora.com/Big-Data/What-concepts-of-linear-algebr...
Could we please reinstate the "built by a former Airbnb Data Scientist", though? That's material information, in that this is not just any curriculum, but one that's expert-curated. As people on this thread have indicated [1], the credentials of the person building a Learning Path are important.
Anyone who clicks on the post can easily see the credentials as you've highlighted them.
I recommend Harvard stats 110 youtube videos: https://www.youtube.com/playlist?list=PL2SOU6wwxB0uwwH80KTQ6...
These videos are more focused on probability, but they contain a lot of great intuitions.
I've noticed that there's a growing demand for performance and reliability engineering types of roles in the tech. Can that become a learning path? The courses for that could be: 1. OS 2. Computer Networks 3. Distributed Systems 4. Intro to Algorithms 5. Intro to Statistics 6. <Some course on best practices of general systems-level troubleshooting?> 7. <Some course on best practices of software debugging?>
I know it sounds almost like a full-fledged MS program in CS. But this could be a great opportunity for those who are not enrolled in those programs but love systems in general and would like to make a career out of it. Apologies if this type of "learning path" makes no sense to most of the industry insiders.