Tuesday 10 May 2011

Buzz Tracking

At the Tableau European Conference, Chris Stolte slightly stole my thunder with his visualisation of twitter messages with the #EUTCC11 conference, so I thought I'd turn this into a tutorial to allow anyone to produce something similar quickly and easily. This is primarily for the people who saw the The Revolution that is Tableau session, but should make sense to anyone with a basic knowledge of Excel and Tableau.

0. Prerequisites
This tutorial requires Python to be installed on your machine in order to get hold of the tweets related to a particular keywords. If you're using a Mac, then Python comes preinstalled. However, it needs to be installed if you are using Windows and if not already installed, is available here. There are a few different versions of Python available, and I tend to go for 2.6 or 2.7 - for this tutorial it doesn't make any difference.

It's also handy to have the Tableau Excel Reshape Add-In installed for step 2. It's not absolutely essential but helpful to do an extra piece of analysis.

1. Getting the tweets
The Twitter API provides an easy to use set of commands to get hold of a number of different pieces of information, ranging from the followers of a certain user, through to details of users and also tweets themselves.

To cut a long story, there are two ways to get the tweets that contain a certain phrase. If you want to listen for them, then the Twitter Streaming API provides a way to leave a piece of code running and as new tweets which meet a certain criterion arise they are "pushed" to you.

The alternative (and what will be used here) is the Search API, which returns results based on a certain search term. The great thing about this second approach is that there is a piece of code already written which provides this functionality and will output a csv containing their contents. The code was written by Michael Bommarito and is available here.

Once you've downloaded the code, and have Python installed, open it using IDLE (the standard program for editing and running python code) or a text editor and change the quoted string to the search term that you want. Save this and then execute it (a double click will suffice). A csv will be created in the same directory as the code file is saved.

2. Prepare the tweets for Tableau using Excel
Before loading the code into tableau, I want to make one transformation to the data to allow me to do something a bit different with the data.

Open the csv that you created in step 1 in Excel and you'll see that there aren't any column names, so add in an extra row at the top of the file and give the columns appropriate names (with the default code, it should be TweetID, Author, Date, Tweet).

Now select the final column (containing the tweets) and go to the Text to Columns... function within the Data ribbon - more information on this is available here - the aim is to break the text down based on using a space delimiter, so that each word ends up in a separate column.

Once you've done this, the next step is to reshape the data using the Tableau Excel Add-In. Select cell D2 and reshape the data based on this. This will create a second worksheet which has "melted" the twitter data so that there's only one word per row (with the info on tweetid, author and date). Save this as a new excel file so that both worksheets are saved within the same workbook.

3. Connecting to the data in Tableau and creating calculated field.
Now the data is ready for Tableau, it's a standard case of connecting to the Excel file and making data connections to each of the two worksheets.

One key thing which you might have noticed from the excel file is that the date column has a slightly non standard format to it. To enable Tableau to work with the dates correctly, I created a calculated field that extract the date and time from this field (replace the date column reference as necessary): DATEtime(mid([date], 6, 20))

4. Enjoy.
You've now got the basics to put together an analysis of tweets around a particular hashtag, including looking at when the tweets were, what words are being used and which users are tweeting the most.

Edit: for information on reshaping data using R, and the source of "melting data" see this link from the great Hadley Wickham, section 3 in the paper talks about the melting in particular.

No comments:

Post a Comment