The Data Science Learning Path

Categories Data Science

Part of why I created this blog is to help myself and others follow our passion of learning and feed our desire for knowledge, keeping this in mind I’d love to have my first post on this blog as a little road map for beginners who are interested in Data Science or at least discovering the field.

I’m not going to put a time table for this, so move at your own pace, everything I will recommend could be a hit or a miss, so if you have a better alternative let me know, most of these courses are here because I’ve tried them and thought you might also find them quite useful for a start.

Now before we start here’s an important note:

Machine Learning is not Data Science, Data Science is not Machine Learning

To further illustrate this I want you to take a look at this picture.

Data Science Process
The Data Science Process

Machine Learning is only the “Model the Data” phase of the process, it is fun and it usually consumes a lot of time (If we include cleaning and arranging the data then this definitely takes the most time).

I’m going to list the main topics I believe you need to learn, the order is somewhat mandatory, however I’ll try to note which can go parallel with whom.

Statistics and Mathematics

Stat 110 This is a very good and comprehensive Statistics and Probability course, try to do your best to finish it, however you can be okay with the first 14 lectures.

Khan Academy: Linear Algebra I think the Khan Academy course on Linear Algebra is quite comprehensive, you can study it, ideally finish it but I think the first 33 videos would suffice.

You can do the previous two courses in parallel, you can skip them if you have studied some introductory level probability and linear algebra in college.

Programming in Python
If you’re familiar with programming then you should jump to the CodeCademy Python tutorial, I’ve really liked it and it was a quick introduction to Python:
CodeCademy Python

If you’re not familiar with Programming then you should follow an introductory course to Programming, MIT offers a course on Edx for that:
MIT CS

Data Science and Machine Learning
Now you need to understand what is Data Science, a good course that explains this well and is fun and has some very nice exercises is CS109 from Harvard, I’ve personally enjoyed it and I guess you will too
CS109

How can we mention Machine Learning and not mention professor Andrew Ng?
His introductory course on Machine Learning is one of the most successful courses on Coursera, a definitely must take, the only downside is that the assignments are in Octave rather than Python, but I guess it won’t really matter, you’ll get a sense of how these algorithms work and that’s what’s most important.
CS229

You can take the two courses in parallel.

Now in my opinion you’re ready to take on some bigger challenges, you can apply for an internship or start with Kaggle competitions, however either way you choose you should keep these goals ahead of you:

  • Hands on experience with Data cleaning and preprocessing
  • scikit-learn projects, discover as much algorithms as possible

Advanced Topics
You can pick a topic that is challenging and start pursuing it, these topics are usually harder than average but they’re a lot of fun.

Natural Language Processing
The folks from Stanford have put a very recent course on Deep Learning for NLP you can find it at:
CS224n

Computer Vision
Stanford again, Andrej Karpathy has the best available course for Computer Vision and it also has a very good Deep Learning introduction.
CS231n

Reinforcement Learning
I haven’t studied any RL courses, so I’m recommending blindly, but RL is a hot topic and Deep Mind has shown some really nice progress in games with it so I guess you should go this way if you’re interested in building an agent that will interact with an environment (Games, Robotics, Finance, …)
CS294

Andrew Ng’s new Deep learning specialization
Andrew Ng has recently released a deep learning specialization which is very up to date and suited towards beginners/intermediates, it covers the basics and practical examples.
Deeplearning.ai

Jeremy Howard’s Fast AI
This course is very practical with a lot of projects and practical tips for building Deep learning models, it is also highly recommended.
Fast.ai

This is definitely not a comprehensive career path for becoming a Data Scientist, but it will definitely put you on the right track, there’s still more to it, the following are some keywords for you to search for, these are Tools/Algorithms that you might need to learn sooner or later:

  • Apache Spark
  • Gensim
  • Tensorflow
  • Keras
  • D3.js
  • XGBoost
  • LightGBM
  • Recommender Systems

 

If you’re looking into more Machine Learning then this specialization from Washington University has been receiving quite positive feedback you can give it a try:

Machine Learning Specialization

The possibilities are endless and the field is quite rich and in need of people, don’t worry if you’re not so sure you’ll learn and apply and then you’ll know for a fact what you prefer and what you don’t, if you can mix your experience with innovation you can create a creative solution for any problem people are currently facing and develop a product out of it that can be your life changing experience, so start today if you’re interested and this will change your life.