Pandas Cheat Sheet for Data Science in Python

Datacamp has put out a really cool cheat sheet for Pandas — everybody’s favourite Python data science library.

The fast, flexible, and expressive Pandas data structures are designed to make real-world data analysis significantly easier, but this might not be immediately the case for those who are just getting started with it. Exactly because there is so much functionality built into this package that the options are overwhelming.

That’s where this cheat sheet might come in handy.

It’s a quick guide through the basics of Pandas that you will need to get started on wrangling your data with Python.

Read More

Machine Learning 101

#ML #Diary

Google has just released a series of videos to teach machine learning.

The first step is, however, installing and playing with Anaconda — a completely free Python distribution (including for commercial use and redistribution). It includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis.

Choose the command line installer (on OSX) — it will save you a LOT of bother.

Installing Anaconda also means getting to know and love Conda — a package manager application that quickly installs, runs, and updates packages and their dependencies. It seems to be like pip, but better?

Conda has a test drive, which I am now trying out. Notes as I go along —

  1. Step one failed. I needed to try reinstalling using the command line installer. Chrome blocks the download as malicious, so I got the file using curl. Now running the installation. I had to edit .bash_profile to  edit the PATH variable to include the conda directory. Everything seems to be working now.
  2. I ran through the test drive in about half the suggested time. The most useful thing was this conda cheat sheet I downloaded.Key commands:
    Create an environment

    conda create -n snowflakes biopython

    Switch to the environment

    source activate snowflakes

    Remove an environment

    conda remove -n snowflakes --all

    Install a new package to an environment

    conda install -n snowflakes beautiful-so up
  3. Now creating an environment — calling it datalab –and installing the scikit-learn package

    conda create -n datalab scikit-learn