Thursday 12 May 2011

Wikipedia World Events Mapped

A couple of months ago, ragtag.info published a fantastic visualisation of world history in 100 seconds (according to English Wikipedia) - the video shows all the historic events on Wikipedia which have a longitude and latitude associated with them. Although it starts slowly, the animation starts to go crazy from the 18th century onwards and gives a fantastic snapshot.

Talking to Pat Hanrahan at the European Tableau Conference, I mentioned the world history viz and then had the (beer-enabled) "aha" moment of using Tableau to take it one step further and allow the user to interact with it to find out more about a particular event or look at a particular year.

I had anticipated that I would have to figure out how the authors of the original viz had got hold of the data, scraped it again myself and then put the data into Tableau. Thankfully, they were kind enough to publish the data behind the viz for others to use.

The result is below. It took me less than an hour to create. During my session at the conference I talked about The Revolution that is Tableau. And for me, the speed of creation is that revolution. I can create fantastic quality, interactive visualisations which are published on the web in literally no time at all.

You can use the slider at the bottom or the drop down box to choose a particular year. Hovering over a particular point will bring up some information on the event (tooltip hat tip to Andy Cotgreave) and clicking on it will bring up a pop up window with the Wikipedia article that it relates to (you might need to enable popup windows to be able to do this).

For Tableau users out there, please download the workbook and enjoy pressing the play button below the slider (you might want to start at about 1500AD). You can also see a video here.

One caveat to note, is that the data behind this is based on English Wikipedia, so there is quite a western focus in terms of the events being shown.

Enjoy.


Tuesday 10 May 2011

Buzz Tracking

At the Tableau European Conference, Chris Stolte slightly stole my thunder with his visualisation of twitter messages with the #EUTCC11 conference, so I thought I'd turn this into a tutorial to allow anyone to produce something similar quickly and easily. This is primarily for the people who saw the The Revolution that is Tableau session, but should make sense to anyone with a basic knowledge of Excel and Tableau.

0. Prerequisites
This tutorial requires Python to be installed on your machine in order to get hold of the tweets related to a particular keywords. If you're using a Mac, then Python comes preinstalled. However, it needs to be installed if you are using Windows and if not already installed, is available here. There are a few different versions of Python available, and I tend to go for 2.6 or 2.7 - for this tutorial it doesn't make any difference.

It's also handy to have the Tableau Excel Reshape Add-In installed for step 2. It's not absolutely essential but helpful to do an extra piece of analysis.

1. Getting the tweets
The Twitter API provides an easy to use set of commands to get hold of a number of different pieces of information, ranging from the followers of a certain user, through to details of users and also tweets themselves.

To cut a long story, there are two ways to get the tweets that contain a certain phrase. If you want to listen for them, then the Twitter Streaming API provides a way to leave a piece of code running and as new tweets which meet a certain criterion arise they are "pushed" to you.

The alternative (and what will be used here) is the Search API, which returns results based on a certain search term. The great thing about this second approach is that there is a piece of code already written which provides this functionality and will output a csv containing their contents. The code was written by Michael Bommarito and is available here.

Once you've downloaded the code, and have Python installed, open it using IDLE (the standard program for editing and running python code) or a text editor and change the quoted string to the search term that you want. Save this and then execute it (a double click will suffice). A csv will be created in the same directory as the code file is saved.

2. Prepare the tweets for Tableau using Excel
Before loading the code into tableau, I want to make one transformation to the data to allow me to do something a bit different with the data.

Open the csv that you created in step 1 in Excel and you'll see that there aren't any column names, so add in an extra row at the top of the file and give the columns appropriate names (with the default code, it should be TweetID, Author, Date, Tweet).

Now select the final column (containing the tweets) and go to the Text to Columns... function within the Data ribbon - more information on this is available here - the aim is to break the text down based on using a space delimiter, so that each word ends up in a separate column.

Once you've done this, the next step is to reshape the data using the Tableau Excel Add-In. Select cell D2 and reshape the data based on this. This will create a second worksheet which has "melted" the twitter data so that there's only one word per row (with the info on tweetid, author and date). Save this as a new excel file so that both worksheets are saved within the same workbook.

3. Connecting to the data in Tableau and creating calculated field.
Now the data is ready for Tableau, it's a standard case of connecting to the Excel file and making data connections to each of the two worksheets.

One key thing which you might have noticed from the excel file is that the date column has a slightly non standard format to it. To enable Tableau to work with the dates correctly, I created a calculated field that extract the date and time from this field (replace the date column reference as necessary): DATEtime(mid([date], 6, 20))

4. Enjoy.
You've now got the basics to put together an analysis of tweets around a particular hashtag, including looking at when the tweets were, what words are being used and which users are tweeting the most.

Edit: for information on reshaping data using R, and the source of "melting data" see this link from the great Hadley Wickham, section 3 in the paper talks about the melting in particular.