Wednesday 26 August 2015

Shooting for the Stars (But Missing Quite Often)

You may have noticed that this has become a bit of a Football stats blog of late, but with the new season just three games old, a whole plethora of data available and some new toys to play with, I think it'd be rude not to!

The latest dataset comes courtesy of www.football-data.co.uk, it's a relatively straightforward set of data containing standard high level data on matches played so far this season. The format it's in doesn't really lend itself to much analysis though, so my favourite new toy - Alteryx - helps me prep the data in ridiculously quick time.

The raw data gives one row per match, dividing the team data into 'Home' and 'Away'. So if I want to aggregate all data relating to Chelsea, it becomes a bit difficult, as they essential have two fields for each statistic, as well as two fields for their actual name. Using Alteryx I split out the 'Home' and 'Away' team data, cleaned it up, renamed the headers to match and used the Union tool to append the 'Away' data to to the end of the 'Home' data. The finished workflow is remarkably simple:




I'm bringing the data in, using the Record ID tool to assign each match a unique ID, using the Select tool to pick up the 'Home' fields only, and another for the 'Away' fields where they are renamed to match, then using the Union tool to put them back together again, using two Formula tools I'm adding a new field called 'Season' and with the second I'm setting up a field called Result and populating it before outputting the data. Easy.

All I needed to do now was import it into Tableau Public and see what I could find:



No comments:

Post a Comment