I have been playing with #MCFCAnalytics data set for the past 4-5 days. I have been having a lot of fun with the data.
One of the key reasons is “ease of use”.
The data is provided in the very simple Comma Separated Values (CSV) format. CSV is one of the simplest data formats where each column of data is separated by a comma. You may open this file in Notepad, excel or any text editor. I have worked a lot with data football or otherwise. I end up spending the majority of my time in getting the data into correct format. I was pleasantly surprised to find the MCFCAnalytics data in CSV format. I opened the file in excel, created a pivot table and I was on my way.
If you have ever used excel before, using this data set is very easy. Unzip and open the file in excel. In excel you can start playing with the data using a Pivot table . A pivot table helps you slice and dice the data the way you want.
For example :- if you want to see the # of goals conceded by Manchester City in each of it is 19 away games, you could do that with 3-4 clicks in the pivot table.
If you are comfortable with excel, for visualizations you may you use charts in excel or try Tableau Public. Tableau Public is free. It supports the CSV format. Tableau provides much slicker visualizations than excel charts but you might need a few days to get ramped up on it.
While the Lite data set doesn’t capture every event in every game, it is an exhaustive list of almost every stat about teams and players aggregated at a game level for all the 380 games of the 2011-12 EPL season. Playing with the Lite data set helps you get an idea of the metrics and KPIs available for analyzing performance. It leads you to more questions and in a way prepares you for working with a more extensive and complex data set.
For example :- I always wondered if there was a relation between final third passing and goals scored .I did some analysis and found out that there is a strong correlation between passing in the final third and the goals scored. Now I want to dig further and find out if there is a particular zone within the final third that has a stronger correlation to the goals scored. I need the (X,Y) data associated with each pass to figure it out.
That brings us to the most important aspect of data analysis. Before taking a deep-dive into the data, always ask yourself:
“What is the question you want to answer using the data.”
Without a question in mind you are bound to get lost or lose interest in the data very quickly. The question could be anything – from “Who took the most shots in EPL in 2012” to “is there a correlation between wins and shots taken” etc. In my example above if I never ventured to answer my first question, I could have never gotten to the second question.
I thank Gavin Fleig and Manchester City Football Club, Simon Farrant and Opta Sports Pro for starting this great initiative.
I have seen some interesting work produced by a number of people already. I hope to see a lot more in the coming days and weeks.