Pandas Cheat Sheet for Data Science in Python

Datacamp has put out a really cool cheat sheet for Pandas — everybody’s favourite Python data science library.

The fast, flexible, and expressive Pandas data structures are designed to make real-world data analysis significantly easier, but this might not be immediately the case for those who are just getting started with it. Exactly because there is so much functionality built into this package that the options are overwhelming.

That’s where this cheat sheet might come in handy.

It’s a quick guide through the basics of Pandas that you will need to get started on wrangling your data with Python.

Read More

Machine Learning 101

#ML #Diary

Google has just released a series of videos to teach machine learning.

The first step is, however, installing and playing with Anaconda — a completely free Python distribution (including for commercial use and redistribution). It includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis.

Choose the command line installer (on OSX) — it will save you a LOT of bother.

Installing Anaconda also means getting to know and love Conda — a package manager application that quickly installs, runs, and updates packages and their dependencies. It seems to be like pip, but better?

Conda has a test drive, which I am now trying out. Notes as I go along —

  1. Step one failed. I needed to try reinstalling using the command line installer. Chrome blocks the download as malicious, so I got the file using curl. Now running the installation. I had to edit .bash_profile to  edit the PATH variable to include the conda directory. Everything seems to be working now.
  2. I ran through the test drive in about half the suggested time. The most useful thing was this conda cheat sheet I downloaded.Key commands:
    Create an environment

    conda create -n snowflakes biopython

    Switch to the environment

    source activate snowflakes

    Remove an environment

    conda remove -n snowflakes --all

    Install a new package to an environment

    conda install -n snowflakes beautiful-so up
  3. Now creating an environment — calling it datalab –and installing the scikit-learn package

    conda create -n datalab scikit-learn

PyFactStream — Analysing news


Just playing around with a very interesting library — Newspaper —

It solves the problem of scraping news content (though it doesn’t necessarily bring in full text on a lot of sites).

I piped this text through the OpenCalais text processing engine —

Let’s see where I get with this 🙂

The Perfect Mac Dev Setup

Fire up your terminal.

  1. Install Xcode Command Line tools
    xcode-select --install 
  2. Install Homebrew
    ruby -e "$(curl -fsSL" 
  3. Install Git
    brew install git 
  4. Install OhMyZSH
    sh -c "$(curl -fsSL" 
  5. Download and Install Virtual Box
  6. Download and Install Vagrant
  7. A LAMP EnvironmentNow to set up a LAMP environment. We’re using Scotch Box. More HereYou will get Ruby, PHP, Node, NPM, Composer along with MySQL, Postgres .. you get the picture. Visit their site for a complete download.
  8. Init the Scotch Box repo in your new project folder
    vagrant init scotch/box
  9. Get into the folder and start the box
    vagrant up

Essential Free Software For The Mac

A new Mac buyer recently asked me for the best starter set of software for a new Mac. I am very Apple — which means I have an iPhone, a Mac, an Apple TV and an iPad, so I like things that let me work across all my devices seamlessly. For all office work I highly recommend Apple’s Pages, Numbers and Keynote apps — they are very very good, are free, and work everywhere (except the Apple TV). You won’t find a note-taking app in my list because I’m using Apple’s inbuilt Notes. I also use Mail, Calendar and Address Book — all default apps. And I can’t even begin to talk about how good iMovie is for video editing (I have been editing video since 2005, and while my professional video editing days are behind me, I love how easy iMovie makes life for me).

Anyway, here’s my list (and I would recommend installing in this order):

Essential Software:

System Cleaner
Run it once and then set it up to auto-execute on a weekly/monthly basis. It will clear your system of un-needed files.

Dropbox (iOS + OSX)
Cloud Storage
I used only free storage for around a year, now I pay for a Terabyte of storage. I cannot imagine life without Dropbox.

FTP Client
And not just FTP, supports S3, Azure etc.

Torrent Client
My favorite way to download files — uTorrent is a very good alternative, but for some reason I’ve always preferred Transmission.

The Unarchiver
File Compression
Zip, Rar and almost every other file compression format is supported

Media Player
Plays every media file out there. I use it instead of Quicktime.

Video Format Transcoder
This is what I use to convert video files into formats supported by iTunes so I copy stuff over to my phone/ipad.

Text Editor
If you code, you’re going to love this. If you like text-editors, you are going to love this.

Photoshop Replacement
As powerful as Photoshop and ridiculous overkill for most people.

Pixelmator (Especially, if Gimp feels like too much)
Image Editing
Powerful image editing — most of you will be happy with this.

Illustrator Replacement
Ridiculously good. Seriously.

Pocket (iOS + OSX)
Article Reader / Saver
My favorite way to save articles I live/love — I have a pocket button in my browser, the app on all my devices and I love it. And I love the clean interface it gives me to read articles.

Reeder (iOS + OSX)
RSS Reader
Old school 🙂 Still use an RSS Reeder and send almost everything I like to Pocket to read later.

Removes unwanted language files (saved 3+ GB of space when I ran it recently)

Note to Self: Two Open Source data projects I have to play with

Data is wealth. And I don’t like the idea of having all my data making other people wealthy. In my continuing quest (FreedomBox, ThinkUP) to track alternates to popular, commercial, offerings, here’s my latest list of things in the order I hope to play with when I find the time 🙂

Dropbox -> OwnCloud

I can’t live without Dropbox frankly. It is an awesome piece of software. The folks over OwnCloud have, what looks like, a decent offering. And besides files, it also supports syncing for bookmarks, contacts and calendars across devices.

YouTube/Flickr/Soundcloud -> MediaGoblin

MediaGoblin lets you host and share videos, music, and images and is a replacement for media-publishing services. I just bought a domain — — that I thought I could use with this 🙂

WIP: The Perfect Web Dev Mac

— Will keep updating this article —

Two things happened recently — I upgraded to Mac OSX on my Macbook Pro and got a new Mac Air at work. The second is important because I don’t actually get to code much in my day job. And I can’t play around too much because I don’t have the time to break it — and fix it.

I wanted to set up the perfect web dev environment and started looking around. I intend to code in PHP (think CodeIgniter and Laravel) and learn Node.

As I went about this task, I quickly discovered that there are some hacks needed. I’ve tried to list everything out below.

Before you start, I highly recommend having iTerm installed — it replaces the default terminal in OS X and makes life a lot easier. Also, figure out which text editor you like. I’ve always been a Textmate fan, but am trying out Sublime Text.

Installing XCode command line tools


/usr/bin/ruby -e “$(/usr/bin/curl -fksSL”



brew install dnsmasq


curl -sS | php

Laravel 4

sudo chown -R www:www app/storage

Using Github with Webfaction (or AWS, or pretty much any other server)

At 2020Social we’ve been building some really cool apps over the last few months. Working with multiple apps means that we need to keep track of what we’re up to, what changes we make etc. And since we iterate rapidly, we need to make sure that we know what bits of code we’re changing. So we use Github as a source code repository.

We use Webfaction as a dev server before moving our apps over to Amazon Web Services. And we’ve figured out how to use Git to tie it all together.

The first thing we did was set up 2020Social as an organization on Github — you can find us here. This has the major advantage of allowing us to set up teams of people working on a particular code. It also allows us to set up a team that comprises only of all the servers we have lying around.

So, the formal (and truly great) way to deploy to servers is using Capistrano. But it can seem like overkill at times, especially on our dev servers. So, we came up with a (hopefully elegant) solution. Of course, this only works if you’re comfortable with the *nix command line and have ssh access to your servers.

  1. Install git on webfaction in our home directory [instructions]
  2. Create an ssh key pair. To know more about ssh keys check this out.

    cd ~/.ssh
    ssh-keygen -t dsa "give your server a name"

  3. Create a user using the free plan on, add the ssh key generated above
  4. Create a team on github in your organization — call it servers — and add the user you just created. Give this team only pull rights
  5. On your server, set up an application folder. On webfaction you need to use the web-based control panel.
  6. In the application folder, do the following

    git init
    git remote add [repo_name] [repo_url]
    git pull [repo_name] [branch_name]

That’s it. You can now merrily pull code on to your server from your github repo. And if you avoid uploading files, you’ll know for a fact that all the versions of code floating around are connected to each other, with a history you can track.