Python for Machine Learning and Data Engineering

Read this if you are new to Machine learning.

In this post, I want to summarize the useful tutorials to learn machine learning using python. Python provides a comprehensive set of tools for ML. iPython Notebook is one of the best tools that I have come across in recent times.

Tools/Packages:

  • Pandas — Importing data and for preprocessing
  • Numpy — Fast, Efficient Matrix manipulations
  • Sklearn — Machine learning toolkit.
  • iPython — Interactive environment to code.

These 2 videos will help you kickstart on iPython and Sklearn

Exploring Machine Learning with Scikit-learn

Diving Deeper into Machine Learning with Scikit-learn

If you are new to python, I would recommend the python course on Udacity

In the meantime, Kaggle also provides good tutorial set to try out python

http://blog.kaggle.com/2013/01/17/getting-started-with-pandas-predicting-sat-scores-for-new-york-city-schools/

http://blog.kaggle.com/2012/07/02/up-and-running-with-python-my-first-kaggle-entry/

If you find any other useful tutorials, please add it to the comment

Beginner’s useful resource for Machine Learning / Data Science

I am new to Machine Learning / Data Science. As I searched for more resources to learn ML – the concept, tools, I found these interesting resources.

Complete Machine learning overview : Coursera Machine Learning by Andrew Ng. This is the best place to start learning it from the scratch

Complete Data Science : Data Science from Harvard .

As the name states, the difference between the 2 courses is that the first course is more oriented towards different algorithms for learning from the data, the second course is more oriented towards data collection, data pre-processing and understanding the data.

Tools:

There are a variety of tools/languages that can be used. I have tried Octave, R, Weka, Python, Matlab, Julia. R is one of the widely used in the industry each has its own advantages/disadvantages. Finally I chose to use python since I am more comfortable to python language. (I have tried Octave, R, Weka, Python).

http://www.experfy.com/blog/python-data-science/

http://programmers.stackexchange.com/questions/181342/r-vs-python-for-data-analysis

https://www.kaggle.com/forums/t/5243/pros-and-cons-of-r-vs-python-sci-kit-learn

Kaggle :

Kaggle is an active community for machine learning. you get lot of data sets with which you learn as well as compete with other kagglers. Start trying out the competitions that are posted in there