The latest dataset comes courtesy of www.football-data.co.uk, it's a relatively straightforward set of data containing standard high level data on matches played so far this season. The format it's in doesn't really lend itself to much analysis though, so my favourite new toy - Alteryx - helps me prep the data in ridiculously quick time.
The raw data gives one row per match, dividing the team data into 'Home' and 'Away'. So if I want to aggregate all data relating to Chelsea, it becomes a bit difficult, as they essential have two fields for each statistic, as well as two fields for their actual name. Using Alteryx I split out the 'Home' and 'Away' team data, cleaned it up, renamed the headers to match and used the Union tool to append the 'Away' data to to the end of the 'Home' data. The finished workflow is remarkably simple:
I'm bringing the data in, using the Record ID tool to assign each match a unique ID, using the Select tool to pick up the 'Home' fields only, and another for the 'Away' fields where they are renamed to match, then using the Union tool to put them back together again, using two Formula tools I'm adding a new field called 'Season' and with the second I'm setting up a field called Result and populating it before outputting the data. Easy.
All I needed to do now was import it into Tableau Public and see what I could find:
No comments:
Post a Comment