I'm a big fan of Pinboard (a bit like Delicious) as a way to bookmark things that I come across on the web and want to save for later and recently realised that I'd tagged more than 1000 pages since I joined the service.
As a quick first step, I thought I'd look at whether my habits have changed over time - am I tagging stuff more on certain days of the week, hours of the day, months of the year?
From this first analysis, there have definitely been a few lulls in my activity, notably around the beginning of 2010 and July and September of this year. I think most of these are likely due to holidays.
The notable uplift in November of this year was due to making use of ifttt to automatically bookmark the tweets that I favourite.
My tagging activity has also become more consistent across the week, moving from a Monday focus in 2010, to Sunday winning so far in 2011. At the same time, the difference between days has become far less pronounced in 2011.
I've added a fair bit of interactivity to the chart, so that it's possible to see various cuts of the data - e.g. by clicking on Monday, it's possible to see how my tweets vary by hour of the day on Monday in each of the three years.
If you're a Pinboard user and want to recreate this, I've put the R code that I used here, which creates a CSV file which you can plug in to the Tableau file and it should update for your usage. Note that you need to put your username and password into the R file to allow it to be authorised by Pinboard.
The Department for Work and Pensions has published a new report on predicting the probability for reaching 100 years old at different age groups. Although the Guardian Data Store focussed more on the year of birth, the age in 2011 was equally interesting - particularly once you reach your mid 80s.
Even more so, the differences between men and women are quite substantial - both for a given age, but also looking up the age at which a man is as likely as an older woman of reaching 100. They say a picture is worth a thousand words and hopefully this Tableau Public Viz sheds some light on the relationship between age, gender and life expectancy.
Alas, now that the Tour de France has duly finished, I need something to do with my days, rather than watching the ITV4 highlights.
With that in mind, and inspired by a Cycling News article, I thought it would be interesting to have a look at how Cadel Evans (or Cuddles to his friends in the media) won the Maillot Jaune after resolutely keeping with the Schlecks in the mountains and then thrashing through the Time Trial.
To do this, I scraped the results of each of the 21 stages of the Tour (code will appear shortly) and then used Tableau Public to visualise the results predominantly using a Bump Chart, which shows how each rider compares to the average at each stage. Statistical Skier has produced some excellent analysis of the results and presented them similarly here.
Within the visualisation, it's possible to look at how a rider has done - just right click on their name and choose the highlight option.
It's also interesting to look at how multiple riders have done. One way of doing this is to filter down the list of riders based on their ranking or team, selecting these riders from the list below and then highlighting or filtering on these riders using a right click - for example, the battle between Evans and the Schlecks, or the performance of Thomas Voeckler.
A couple of months ago, ragtag.info published a fantastic visualisation of world history in 100 seconds (according to English Wikipedia) - the video shows all the historic events on Wikipedia which have a longitude and latitude associated with them. Although it starts slowly, the animation starts to go crazy from the 18th century onwards and gives a fantastic snapshot.
Talking to Pat Hanrahan at the European Tableau Conference, I mentioned the world history viz and then had the (beer-enabled) "aha" moment of using Tableau to take it one step further and allow the user to interact with it to find out more about a particular event or look at a particular year.
I had anticipated that I would have to figure out how the authors of the original viz had got hold of the data, scraped it again myself and then put the data into Tableau. Thankfully, they were kind enough to publish the data behind the viz for others to use.
The result is below. It took me less than an hour to create. During my session at the conference I talked about The Revolution that is Tableau. And for me, the speed of creation is that revolution. I can create fantastic quality, interactive visualisations which are published on the web in literally no time at all.
You can use the slider at the bottom or the drop down box to choose a particular year. Hovering over a particular point will bring up some information on the event (tooltip hat tip to Andy Cotgreave) and clicking on it will bring up a pop up window with the Wikipedia article that it relates to (you might need to enable popup windows to be able to do this).
For Tableau users out there, please download the workbook and enjoy pressing the play button below the slider (you might want to start at about 1500AD). You can also see a video here.
One caveat to note, is that the data behind this is based on English Wikipedia, so there is quite a western focus in terms of the events being shown.
At the Tableau European Conference, Chris Stolte slightly stole my thunder with his visualisation of twitter messages with the #EUTCC11 conference, so I thought I'd turn this into a tutorial to allow anyone to produce something similar quickly and easily. This is primarily for the people who saw the The Revolution that is Tableau session, but should make sense to anyone with a basic knowledge of Excel and Tableau.
This tutorial requires Python to be installed on your machine in order to get hold of the tweets related to a particular keywords. If you're using a Mac, then Python comes preinstalled. However, it needs to be installed if you are using Windows and if not already installed, is available here. There are a few different versions of Python available, and I tend to go for 2.6 or 2.7 - for this tutorial it doesn't make any difference.
It's also handy to have the Tableau Excel Reshape Add-In installed for step 2. It's not absolutely essential but helpful to do an extra piece of analysis.
1. Getting the tweets
The Twitter API provides an easy to use set of commands to get hold of a number of different pieces of information, ranging from the followers of a certain user, through to details of users and also tweets themselves.
To cut a long story, there are two ways to get the tweets that contain a certain phrase. If you want to listen for them, then the Twitter Streaming API provides a way to leave a piece of code running and as new tweets which meet a certain criterion arise they are "pushed" to you.
The alternative (and what will be used here) is the Search API, which returns results based on a certain search term. The great thing about this second approach is that there is a piece of code already written which provides this functionality and will output a csv containing their contents. The code was written by Michael Bommarito and is available here.
Once you've downloaded the code, and have Python installed, open it using IDLE (the standard program for editing and running python code) or a text editor and change the quoted string to the search term that you want. Save this and then execute it (a double click will suffice). A csv will be created in the same directory as the code file is saved.
2. Prepare the tweets for Tableau using Excel
Before loading the code into tableau, I want to make one transformation to the data to allow me to do something a bit different with the data.
Open the csv that you created in step 1 in Excel and you'll see that there aren't any column names, so add in an extra row at the top of the file and give the columns appropriate names (with the default code, it should be TweetID, Author, Date, Tweet).
Now select the final column (containing the tweets) and go to the Text to Columns... function within the Data ribbon - more information on this is available here - the aim is to break the text down based on using a space delimiter, so that each word ends up in a separate column.
Once you've done this, the next step is to reshape the data using the Tableau Excel Add-In. Select cell D2 and reshape the data based on this. This will create a second worksheet which has "melted" the twitter data so that there's only one word per row (with the info on tweetid, author and date). Save this as a new excel file so that both worksheets are saved within the same workbook.
3. Connecting to the data in Tableau and creating calculated field.
Now the data is ready for Tableau, it's a standard case of connecting to the Excel file and making data connections to each of the two worksheets.
One key thing which you might have noticed from the excel file is that the date column has a slightly non standard format to it. To enable Tableau to work with the dates correctly, I created a calculated field that extract the date and time from this field (replace the date column reference as necessary): DATEtime(mid([date], 6, 20))
You've now got the basics to put together an analysis of tweets around a particular hashtag, including looking at when the tweets were, what words are being used and which users are tweeting the most.
Edit: for information on reshaping data using R, and the source of "melting data" see this link from the great Hadley Wickham, section 3 in the paper talks about the melting in particular.
The Guardian posted today on how the Bank of England's GDP forecasts have changed as new vintages have been released through the inflation reports. I found it really difficult to see how the visualisation they chose showed the point that they were trying to make, so here's my attempt at producing something a bit clearer. I would prefer to keep to the same fan chart style and not have to use a set of point estimates, but the data provided don't lend themselves to doing this very easily or nicely.
Transport for London (TfL) have opened up a lot of their data over the last 18 months, with the first 1.4 million Barclays Bike journeys being one highlight (which allowed for visualisations like this, this and this) and a week's sample of Oyster card being another.
So here's my contribution to the transport network visualisation community - a bus route map of London, powered by Tableau Public. Enjoy. All copyright, etc. is to TfL. To see information about a particular route, just choose the route from the drop down and then click the update button in darker grey. Clicking that button again will show all the routes.
The geeky bit... TfL are kind enough to provide the information for all their bus routes in an easy to read JSON format (as a by product of plotting a route on a google map), which when combined with a bit of code and information on all the routes in London (from londonbusroutes.net) enabled a relatively easy piece of scraping to get all the stop coordinates (you'll see that where a stop is around the corner from the previous one, there's a nasty diagonal line through a building).
The Oyster card data was much easier to get hold of, as TfL make it available to developers as a csv. I would normally put this straight into tableau and give full flexibility to do whatever analysis I wanted to, but Tableau Public is limited to 100k rows (the dataset has more than 1.5million rows in total and more than 700k for bus journeys). So I used R to create various summary statistics at route level - the day and time charts and payment method. I have to say that the data.table package is amazingly quick at aggregating data and proved a life saver to get the data at the right level.
I'm doing one of the two amateur stages of the Tour de France in July (known as L'Etape) which is a one-day, 205km cycle over part of the Massif Central from Issoire to Saint-Flour.
Training for that to date has been a fair amount of spinning at Virgin Active and cycling at the weekend around Richmond Park. This weekend was the first venture into something longer and challenging at one of Evans Cycles Ride It! sportives, which started and finished in Esher, Surrey.
To get an idea of how this compared to L'Etape, I used the excellent gpsvisualizer.com which, amongst other things, can convert KML files (this is what Google uses to plot tracks, etc. on maps) to GPX files (which can be read by GPS systems), but can also plot the elevation profile of a route.
The chart below shows three routes: L'Etape in green, the Evans route in red and my typical Richmond Park jaunt in blue. One word, yikes!
Alas, the best part of December and most of January turned into a hiatus from blogging due to a combination of work commitments, Christmas and worrying about doing the Etape (stage of the Tour de France open to amateurs). The latter has led to many spinning classes, a new appreciation for dance music and a frustration with Virgin Active's timetable system, which makes it very difficult to find out where a class is on. e.g. it's relatively easy to find out the classes on at the Strand on Thursday, but what if I want to know where R.P.M. is on across all branches?
With this frustration in mind, I set about scraping all the timetable information for all days for all branches (the code to do this is here). I then chucked this into Tableau Public, with the aim of creating a small dashboard that would allow anyone to cut the data in a few ways (ideally on an iPhone...).
Below is the current prototype, which has all branches timetables as of 9pm last night. Version 2, coming shortly, will include a postcode function, allowing a user to input a postcode, find out what classes are nearby (as the crow flies...) and directions to the club (based on google maps).