So you want to be a Data Scientist? The good news is that there are tons of great resources out there to learn from. The bad? None is comprehensive, and choosing the best can be completely overwhelming. I created this list to help you stay focused on learning what’s important, the easiest way possible.
But it won’t be easy…
Data Science combines Statistics, Programming, Machine Learning, and Visualization, amongst other disciplines. Simply put, there is a lot to learn. I took every course and read every book on this list, and it took me approximately 210 hours, over a few months.
Ready to dive in? Great! I would love to hear about your experience learning Data Science, and answer any questions. Tweet this post below and let me know how it’s going.
Finally, good luck, and have a lot of fun. I certainly did.
1. Immerse Yourself
We start with some light reading and listening. You can’t spend all your time reading textbooks and taking courses. Get these books and podcasts, and read or listen to them throughout your studies.
12 hr | $29Read The Signal and the Noise by Nate Silver
A fun introduction to Data Science, that will teach you how to think like a data scientist.
9 hr | $17Read Naked Statistics by Charles Wheelan
An easy introduction to statistics, without getting too deep into the maths.
freeSubscribe to the Data Skeptic podcast
Features conversations with data science experts, as well as great mini episodes which teach the basics.
freeSubscribe to the Partially Derivative podcast
A weekly discussion about Data Science related news.
freeSubscribe to the Data Science Weekly newsletter
Data Science news in your inbox, weekly.
2. Learn Python
Programming is a key part of Data Science. There’s an on-going debate about whether you should learn R or Python first. It’s better to pick one than spend your time debating the best. Choose Python.
6 hr | freeDo the Learning Python mission at DataQuest
You’ll learn Python interactively while playing with real data.
If you’re new to programming you may need a more thorough introduction. In that case:
40 hr | $30Read Learn Python the Hard Way
A great introduction to programming using Python.
Otherwise, you’ll pick it up quickly using:
1 hr | freeRead Learn Python in Y minutes
This is a really fast way to learn Python if you’re already a programmer
3. Learn the Big Picture
There are a lot of aspects to Data Science. In this unit you’ll focus on learning how they all fit together. Get a little breadth in your diet.
10 hr | $32Read Data Science from Scratch
This is a fantastic book that introduces you to Data Science, using Python
5 hr | freeTake the Data Analysis and Data Visualization missions at Data Quest
These will teach you about numpy, pandas, and matplotlib, three crucial tools for your toolbelt.
4. Learn Statistics
Statistics is the foundation for much of Data Science. It is the tool we use to rigorously reason about the world using data.
7 hr | freeTake Udacity’s Intro to Descriptive Statistics course
This course seems overly simplistic at times, but it’s a good refresher on descriptive statistics. Tip: Set the playback speed to 1.5x.
10 hr | freeTake Udacity’s Intro to Inferential Statistics course
This course is also a little simple. It’s still worth going through to get a strong grip on hypothesis testing, which is critical in Data Science.
40 hr | $79Read All of Statistics
If you really want to master statistics, this is the book for you. Don’t get too bogged down with the details, but take a good read through it and use it as a reference for the rest of your career.
5. Learn Machine Learning
Machine Learning is a hot topic, and a big driver of the recent flood of interest in Data Science. It’s also a very deep field.
20 hr | freeTake Udacity’s Introduction to Machine Learning course
This is a very practical, hands-on course. You learn how to apply machine learning algorithms using the sklearn Python package.
30 hr | freeTake Coursera’s Machine Learning course
This is a more theoretically rigorous course. It is fantastically done.
Now use your skills and go out and do some actual data science!
8 hr | freeComplete a Kaggle competition
Kaggle provides the data, you provide the science. Try some of their “Knowledge” competitions to get some practice.
12 hr | freeDo your own analysis
Find a real dataset on Data.gov, perform a real analysis, and publish your findings online.