Pandas Cheat Sheet for Data Science in Python

Datacamp has put out a really cool cheat sheet for Pandas — everybody’s favourite Python data science library.

The fast, flexible, and expressive Pandas data structures are designed to make real-world data analysis significantly easier, but this might not be immediately the case for those who are just getting started with it. Exactly because there is so much functionality built into this package that the options are overwhelming.

That’s where this cheat sheet might come in handy.

It’s a quick guide through the basics of Pandas that you will need to get started on wrangling your data with Python.

Read More

The Great Tragedy of the 2016 US Elections

Scott Adams, of Dilbert fame, has emerged as an unapologetic conspiracy theorist and Trump supporter. There is absolutely nothing wrong with him supporting Trump — but, I do take issue with the paper-thin logical arguments he is building to support Trump’s wild assertions about a rigged election. To see what I’m talking about, visit his Periscope account. The daily videos he posts there will take you down a rabbit hole.

To see election rigging at scale, you had to visit parts of India in elections past. Entire voting booths would be taken over by goons who would force people to vote for a particular party — or simply cast the votes themselves. Much of that was eliminated by the use of voter lists, ID cards and machines that make it hard to stuff ballots. It still happens though — in college elections around the country. 

A case can also be made to replace the multitude of electronic voting machines in the US with the kind of hardware used in India — almost un-hackable because most of the work is done as hardware rather than software.

But, I think the greatest lesson to be learned from the elections carried out in the largest democracy in the world (at a scale that will send politicians of all stripes in the US scurrying for cover) is that assertions of rigging are always accompanied with statements of the greatest respect for the voter. Which is why I find the general narrative around how the voter doesn’t matter, how the voter’s vote doesn’t matter, truly unfortunate. More harm is done by that than good. And that people like Scott Adams are not only repeating this canard, but building on this argument using innuendo, conspiracy and a little wink at the worst instincts in humanity — that right there is the great tragedy.

Machine Learning 101

#ML #Diary

Google has just released a series of videos to teach machine learning.

The first step is, however, installing and playing with Anaconda — a completely free Python distribution (including for commercial use and redistribution). It includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis.

Choose the command line installer (on OSX) — it will save you a LOT of bother.

Installing Anaconda also means getting to know and love Conda — a package manager application that quickly installs, runs, and updates packages and their dependencies. It seems to be like pip, but better?

Conda has a test drive, which I am now trying out. Notes as I go along —

  1. Step one failed. I needed to try reinstalling using the command line installer. Chrome blocks the download as malicious, so I got the file using curl. Now running the installation. I had to edit .bash_profile to  edit the PATH variable to include the conda directory. Everything seems to be working now.
  2. I ran through the test drive in about half the suggested time. The most useful thing was this conda cheat sheet I downloaded.Key commands:
    Create an environment

    conda create -n snowflakes biopython

    Switch to the environment

    source activate snowflakes

    Remove an environment

    conda remove -n snowflakes --all

    Install a new package to an environment

    conda install -n snowflakes beautiful-so up
  3. Now creating an environment — calling it datalab –and installing the scikit-learn package

    conda create -n datalab scikit-learn

PyFactStream — Analysing news


Just playing around with a very interesting library — Newspaper —

It solves the problem of scraping news content (though it doesn’t necessarily bring in full text on a lot of sites).

I piped this text through the OpenCalais text processing engine —

Let’s see where I get with this 🙂

The Perfect Mac Dev Setup

Fire up your terminal.

  1. Install Xcode Command Line tools
    xcode-select --install 
  2. Install Homebrew
    ruby -e "$(curl -fsSL" 
  3. Install Git
    brew install git 
  4. Install OhMyZSH
    sh -c "$(curl -fsSL" 
  5. Download and Install Virtual Box
  6. Download and Install Vagrant
  7. A LAMP EnvironmentNow to set up a LAMP environment. We’re using Scotch Box. More HereYou will get Ruby, PHP, Node, NPM, Composer along with MySQL, Postgres .. you get the picture. Visit their site for a complete download.
  8. Init the Scotch Box repo in your new project folder
    vagrant init scotch/box
  9. Get into the folder and start the box
    vagrant up

Predicting The Future


By Kevin Kelly

AI and robots will create new jobs for humans – jobs that are focused on productivity over all else will be done by AI/robots. Jobs that need creativity and the willingness to fail will be done by robots. People with the ability to work with AI will be valued.

Virtual Reality and Mixed/Augmented Reality will lead us to an Internet of experiences. In VR you no longer watch scenes but viscerally experience the environment leading to memory. The hard problem is tracking your body and providing tactile feedback, not creating the world. Best example Void (redirected walking). For AR Mega

VR will become the most social of social media. It is inherently a social experience.

Personalisation and Tracking: anything that can be tracked, will be tracked. In VR your whole self is being tracked. The tracking will become more civilised because of co-veillance. We can track the trackers. There is a correlation between privacy and generic experiences, and transparency and personalisation

The great products of the next 20 years haven’t been invented yet.

The State of Blockchain 2016



HyperLedger – Blockchain project supported by the Linux Foundation. The objective is to create a kernel for the tech (like the Linux kernel). Companies will be able to build their offerings on top of a shared kernel. This gets launched at the end of the year and it will be the most significant software launch of the year.

Ethereum – Blockchain for contracts (and everything else). Andrew Keys calls it Facebook vs MySpace where Bitcoin is MySpace to Ethereum’s Facebook. In six months, the startup has hit a billion dollar valuation.

Blockchain is a new network of contracts and agreements. The smart contract is the killer app. The litigators of tomorrow will understand both Blockchain as well as the law to be able to handle disputes in the era of smart contracts.

The legal profession is about to be disrupted massively over the next 5-10 years as dispute resolution is baked into code.

 In the next 10-15 years, all payment systems are going to be replaced by Blockchain based systems. This will require, and drive, massive changes in financial policy at the local, national and global level. 

Blockchain is opening up a Pandora’s box of questions around the nature of a transaction, any transaction – from person to person to securities transactions.

This technology solves the trust problem: trust between people traditionally has been provided by a middleman (a bank or a notary for example). The Blockchain now provides that trust technologically – removing the middleman. 

This clearly has massive regulatory consequences.

As companies retool themselves for this world and a new regulatory environment, it represents a massive business opportunity for IT services companies that can bring themselves up to speed quickly enough to be able to provide transition services for everything from banks to airlines and schools.

Serious challenges exist for governments as they try to impose financial controls on a completely decentralized financial technology layer.

Programming a new life language 


By Aaron Kimball

The panel submission deck:

Additional Background:


The problem: DNA is incredibly complex and it presents a data challenge. It is not possible to test every combination in which the base pairs arrange themselves.

While the cost of sequencing has fallen, it is not simple to figure out what any particular sequence of symbols does. Interpreting DNA is hard – the conventional method sees scientists try out different hypotheses, experimenting in wet labs, with potential combinations. The downside – of 10,000 attempts we might see a minor result in one. This makes this whole process hard, expensive, and time consuming.

Zymergen has built out a robotic process to automate this, allowing for many experiments to be run in parallel.

The problem is the amount, and kind, of data being generated. 

Fun fact: 93% of all chemicals in use comes from petroleum. Only 6% come from industrial fermentation. However as oil runs out, microbe-based chemical production processes becomes super important. But that needs us to be able to manipulate genes in microbes, designing better microbes.

Four phases:  

A new suite of software is allowing for high throughput microbe design and testing:  

Codon is a language that allows scientists to define a design idea – a gene manipulation within a microbe.  

A sequence looks like this – promoter + gene + terminator. The promoter defines how much the gene expresses itself. For example, how blonde will your hair be – platinum or just a dirty yellow.

The language allows scientists to very quickly create experiments that can test multiple permutations

The rest of the process allows for automation, speed and quick analysis on the data using a sophisticated software stack.

There are inbuilt decision trees based off previous non-machine test results.

The expected outcome is better chemicals that can lead to safer pesticides, plastics that break down, even better medicines.

Planning for Moments


By Kiip

The real impact of IoT on marketing
Connected devices allows us to plan for and measure moments

Allows us to model intent on the basis of usage of devices – moments are directly connected to an action/series of actions by a consumer – typically triggered through app activity

Passive Moments leads to automation leads to detectable intent triggers/ moments which are non intrusive with permission built in

Moment Types: 
By adding proximity to the equation we start to build connected moments (how does connections planning change when we move from connecting funnel activations to moment activations)

Example: Rewards based on understanding of upcoming moments

Example: Oral B at MWC

Predict intent / upcoming moment using narrow AI that is digesting device data signals.

Understanding the new nature of instant gratification – The “connect” and “instant” generation: postmates/instacart/uber

Messaging gives us moments driven by human phrases – machines understanding and responding to natural language prompts from a human

The Connected Generation: the death of traditional segment (by age etc). Segment by intent in the moment.

We are moving to a Moments based CRM: 
Far more powerful than content based intent modeling. As long as brands add value to the moment, instead of abusing it for push-based messaging.