MCFC Analytics blogposts Summary #9


In the past week, I found the following posts written using the #MCFCAnalytics data

  1. Some interesting stuff by @PedroAfonso85 building on some previous work  to breakdown the importance of ball possession and some discussion about the oft discussed yet hard to quantify, momentum.
  2. @MarkTaylor0 analyzed Blocked shots to find if blocking shots is a talent.
  3. @hpstats visualized points difference “with/without” a player in  the starting lineup. Also from the same blog is profiling players based on their shooting
  4. @SportsViz has a video with examples of 3D-visualization of passes using the data from Bolton vs. City game

Previous Summaries

Summary #8

Summary #7

Summary #6

Summary #5

Summary #4

Summary #3

Summary #2

Summary #1

MCFC Analytics – blogposts summary #7


I did not see too many new posts in the past week. I didn’t publish any as I was busy with a different project.

  1. An interactive viz of Bolton – Manchester City  match data by @JBurnMurdoch on @GuardianData blog
  2. @HPStats attempts at defining metrics to be able to cluster players based on their style. Here is a good first step on Passing
  3. @shots_on_target made a summary of vital stats regarding goals, shooting accuracy, penalties etc..
  4. Scouting report on Tim Howard by @footballfactman
  5. An interactive visualization of the full dataset by @PhilyB1976 I posted this in one of the first few summary posts but there is additional information on the site. worth revisiting!
  6. An

Previous Summaries

Summary #6

Summary #5

Summary #4

Summary #3

Summary #2

Summary #1

If I missed any, please post them in the comments section or tweet them to me!

MCFC Analytics-Summary of blogposts #6


This week I saw a few more new bloggers getting into the act with the data.

First up, there was this article by @RWhittall of TheScore.com where Richard talked about “soccer data abuse by some bloggers using the MCFC data”. The gist of the article is that some of the bloggers are extrapolating too much with their conclusions based on one year’s worth of data from one league. The other point made in the article is that the output of the majority of  the work in soccer analytics isn’t groundbreaking and it is just adding a data context to what we already knew.

While I see where Richard is coming from, I don’t quite agree either with his assessment of the state of soccer analytics or the “data abuse” bit.

Unquestionably, we haven’t even scratched the surface of what we can do with data in soccer. The majority of the research work in the soccer analytics is carried out in the private domain.  That is because soccer data is not a commodity like it is in other sports like Baseball. The MCFC & Opta project could be a significant step in the direction of making soccer data more accessible to a wider audience,  if it can get enough passionate people interested in the project. However, like in any type of writing in the public domain, there is the good and the not so good. One of the things we discussed with Gavin Fleig, Head of Performance Analysis at Manchester City, Simon Farrant, Marketing coordinator at Opta et al is to build a community that fosters communication, collaboration and open feedback among the members and the readers. This should help everyone get better in some time.

Without further ado, here are links to some interesting work I found in this past week.

@MarkTaylor0 has a comprehensive piece on the state of soccer analytics and where it stands vis-à-vis other sports like NFL and Baseball. – The case for data analysis in football. This is a must read.

Analytics posts

  1. @PedroAfonso85 has a couple posts using the advanced data set
  2. @ChrisJLilley continues with his positional analysis series with Strikers and Central attacking midfielders
  3. @FootballFactman ‘s piece talks about what to look for in goalkeepers of the premier league
  4. @shots_on_target talks about the correlation between points in fantasy football and attacking stats
  5. In my weekly opposition analysis series I analyzed at Sunderland using last season’s data.

Visualization posts

  1. Earlier today I saw Voetstat, a neat blog by @Voetstat_craig which has some visualizations of pass completion + heatmaps. There are multiple posts. I haven’t had a chance to read all of them yet.
  2. @TomBerthon has this visualization of how goals were scored in the Bolton – City game from last season

If I missed any links, post them in the comments section and tweet them with the hashtag #MCFCAnalytics. I will retweet them.

Previous Summaries

Summary #5

Summary #4

Summary #3

Summary #2

Summary #1

MCFC Analytics – Summary of blog posts #5


We had a great meeting this weekend to discuss how to move our community forward. We discussed some great ideas. As @MCFCGavinFleig pointed out on twitter, the next big announcements and steps forward will be public in late November/early December when the “CityAnalyticsCommunity” will be launched. Until then, keep blogging away with the data.

Here is a summary of the blog posts based on #MCFCAnalytics data.

Analytics posts

  1. @MarkTaylor0How passing sequences create chances – the title is self-explanatory. Great post
  2. @JDewittLong passing in the Premiership – John looks at the long passing and its correlation to finishing position in the league table. Interesting post. A question that came up when I read this post is, how is correlation to points or goals scored instead of position in the league table?
  3. @TheWestStandO digs deeper into Fernando Torres’ struggles in front of goal last season
  4. @ChrisJLilley defines metrics and rates the attacking midfielders, central midfielders and the defensive midfielders of last season.
  5. @We_R_PLComparison of Top scorers in EPL
  6. @Hpstats  – A better passing statistic this was posted in the comments of summary #4
  7. Fulham – opposition analysis by me

Visualization posts

  1. @AlexThamks – a neat viz of Assists, chances created & key passes per formation + the best 11 for each formation using Tableau Public
  2. @Tomberthon  – a visualization of the advanced data set and how to make sense out of it

Other

  1. @MarchiMax has a refined version #Rstats code for parsing the F-24 XML
  2. @DannyPage has implemented a Ruby on rails code for importing the F-24 XML

Past summaries

Summary #4

Summary #3

Summary #2

Summary #1

MCFC Analytics – Summary of blog posts #4


It has been about a month since the basic MCFC data set has been released and it is great to see lots of people churning out stuff using both the basic and advanced data sets.

Based on the tweets with #MCFCAnalytics tag, there are quite a few peoples’ projects are in progress. Good luck to all of you. Make sure you share your project/blog links with the hashtag.

Some people are looking for partners and contributors to the projects they are working on. If you are interested, please keep a tab on the #MCFCAnalytics tab and get in touch with folks directly.

Analysis posts

  1. @MarkTaylor0Analyzing the passes by comparing them to their expected pass completion rates using passes of James Milner in Bolton Vs. Manchester City from 2011-12 season.
  2. Mark also has post on how Man City and Bolton passed the ball
  3. @JdewittHow goals are scored in EPL
  4.  @ChrisJLilleyAnalyzing center-backs of the premier league
  5. @analysefooty (this blog!)Opposition analysis of Arsenal

Visualization posts

  1. @DanJHarrington – a very interesting visualizations of passes using Vector diagrams in Tableau Public
  2. @MarchiMax – a visualization of where the ball is a few seconds before a shot is taken
  3. @OngoalsscoredVisualization of the goalscorer’s body parts. Very neat!

If I missed any please post your links in the comments section.

Links to previous summaries

Summary #1

Summary #2

Summary #3

Feel free to tweet me or email me if you want to chat with me on something specific!

MCFC Analytics – Summary of blog posts # 3


Thanks for the amazing response to Summary of blog posts #1 & Summary of blog posts #2

I also want to thank people who have reached out to me via twitter with links to their blogs & posts.

Goalscorer ‘footedness’ by @DavidAHopkins measures the footedness or the foot favoured by Premier League goalscorers.

How do the more successful clubs keep the ball in EPL by @JDewitt talks about how the top teams in EPL keep possession. Also by John is Successful Passing and Winning

A sneak peek of a very interesting carto by @Kennethfield  Charlie Adam’s “passing wheel”

Football Philosophy – Long passes by @Poolq1984 explores the importance of long ball in football.

@We_R_PL has a nice post on how to use the MCFC dataset more efficiently. He also has spreadsheet which has the own goals calculated per team.

@footballfactman has a post on Darron Gibson using a mix of data from MCFC dataset, whoscored and statszone

The always excellent MarkTaylor0 has detailed post Analysing the quality of shots in Bolton – Manchester City game using the advanced dataset.

@ChrisJLilley has 3 posts on his blog using MCFC data

GK positional analysis

Premier league game changers Part I & Part II

@DanJHarrington has cranked up a lot of things using the advanced dataset

1.  an interactive tableau viz to see touches of each player in Bolton -City on the pitch.

2. Passing visualization using D3.js

3. Dan also has some interesting visualization work in progress. There is a cool video in the link showing ball movement.

Network passing diagrams by @DevinPleuler

Bolton – http://t.co/mcRQ0oHU

Man City – http://t.co/6mtGgJQS

Extracting data from XML

There have been some questions regarding this and some folks have come up with solutions

1. If you have MS Excel 2007 or a later version you can open the file in XML. The only issue with is that XML’s are nested and Excel converts this into a very flat format. So you will see multiple rows for the same events. For example: A successful pass has multiple rows indicating the direction, the x,y coordinates of where it is passed to. Read the data spec thoroughly to understand how the data is formatted in the XML. It will help understand the data much better.

2. Code for R users to extract the F-24 XML by @MarchiMax

3. Code snippets from @JBrisson to extract events from the F-24 XML

4. If you are into programming, most languages have XML parsers. A simple search will get you code snippets to start with.
If I missed any links, please let me know via Twitter or comment on the blog post. Always use #MCFCAnalytics tag in twitter so I can pick them up easily!

Visualizing momentum shifts in Bolton vs. Man City


This is an attempt to visualize the momentum shifts in Bolton vs. City with goals scored and substitutions using the #MCFC analytics advanced data tier – I.

I used possession as a proxy for momentum. The game is divided into 5 minute buckets. If a goal is scored in a bucket, the bucket will end at the minute of the goal scored.
E.g.: 0-4 is bucket #1. If a goal is scored in minute 7, then 5-6 is bucket #2. 7-11 is bucket #3 and so on.

Plots

Figure 1 – Overall cumulative possession difference vs. game time in minutes.

CumulativeAll

2. Figure 2 – Cumulative possession difference up until a goal is scored.
E.g.: Say 1-0 is scored in minute 27 and 2-0 is scored in minute 38. The cumulative possession difference is calculated from minute #1 through 27. After 1-0, cumulative possession difference is calculated from minute 28 onwards (the date of minute 1 through 27 is excluded).
This helps to see if there is any noticeable shift in the momentum of the game after a goal.

CumulativeMomentumShifts

Findings

  • Overall City has dominated possession. The cumulative possession delta was always negative (= in favor of City) in Figure 1.
  • When the cumulative possession difference was reset after each goal scored (Figure 2), we see that Bolton tried to take the initiative after City took the lead in Min 26. They really pushed hard after City scored the 0-2, pulling one back within 2 minutes of City’s 2nd goal.
  • Bolton continued to push for an equalizer until City scored the 1-3 right after the half-time.
  • Bolton enjoyed more possession as they searched for a goal and did a substitution at min 60 (an attacking midfielder for a holding midfielder) – it seems like the move paid off as they scored 2-3 in min 62.
  • Bolton continued to push for an equalizer. City subbed out Aguero for Tevez, both attacking players but Tevez is better at playing a deeper role and hold up the ball.
  • As the game progressed Bolton switched D.Pratley for Chris Eagles (probably a shift in attacking style) and City responded with a defensive move by subbing out Dzeko for Adam Johnson
  • Those two moves by City helped them restore control. Towards the end, City subbed out attacking midfielder David Silva for fullback Zabaleta, a defender to secure the result in the dying minutes.

This is a very quick and simple interpretation & visualization of the moment shifts. All feedback is welcome.

Passing in the final third and goals – EPL 2011-12 #MCFCAnalytics


Question:

Is there a correlation between passing in the final third and the goals scored?

I used the #MCFCAnalytics data set to find the answer.

Analysis

Plot of  Total # of completed passes in the final vs. Goals scored for all the 20 teams in the 2011-12 season of the Barclays Premier League

 Findings:

  • Linear regression had an R2 of 0.671indicating a strong correlation between passes completed in the final third and goals scored.
    Excluding the outlier of Liverpool from the dataset the R2jumped to 0.827.
  • Liverpool is ranked 3rd in the # of passes completed in the final third. However, they are only ranked 15th in goal scored.
  • 75.73– Liverpool’s expected goals scored based on the above regression. However, they managed to score only 42 goals.
    • What is the reason for the huge negative difference?
  • Swansea’s case is interesting. You may remember the term “Swansealona” was one of the favorites with EPL analysts and reporters last season due to their reputation for passing style and high amounts of possession. However, they are below the league average on passes completed in the final third.
  • Newcastle  is ranked 18th in passes completed in the final third. However, Newcastle is ranked 7th in goal scored.Expected goals scored for Newcastle is 29.6. They managed to score 51!
  • Blackburn is ranked last in passes completed in the final third. However, Blackburn scored a lot more goals (44) than their expected goals scored (24.2)
  • Stoke is at the bottom – Lowest # of goals scored and 2nd lowest # of passes completed in the final third.  Not surprising based on their style of play.

Liverpool

I hypothesized that

  1. Liverpool might be crossing a lot and
  2. Most crosses occur in the final third. (I would love to look at (X,Y) data to establish this fact.)
  3. Poor shot quality (which might or might be related to their propensity to cross)

Findings:

  • 1103 – Liverpool attempted the highest # of crosses +corners of all teams in 2011-12
  • 840 –  Liverpool attempted the highest # of open play crosses in 2011-12
  • 19th in overall crossing efficiency  (#of successful crosses+corners/# of successful  + # of unsuccessful crosses+corners)
  • 14th in open play crossing efficiency (# of successful open play crosses/# of successful + # of unsuccessful open play crosses)
  • 18th in overall shooting efficiency ( shots on target/shots on target + shots off target + blocked shots)
  • 15thin shooting efficiency not including blocked shots (shots on target/shots on target + shots off target)

    A glance at the top 10 open play crossers of Liverpool in 2011-12.

Player

Attempts

Efficiency

Downing

148

0.209

José Enrique

138

0.210

Henderson

72

0.125

Adam

70

0.157

Gerrard

69

0.203

Bellamy

67

0.194

Johnson

65

0.185

Kuyt

57

0.246

Suárez

47

0.149

Kelly

38

0.105

Liverpool Average

0.192

League Average

0.202

  • 2 – According this article on EPLIndex, Liverpool scored just 2 goals from 840 open play crosses all season. That is 1 goal per every 420 open play crosses.
  • 79 – The average # open play crosses per goal scored in the 2011-12 season. Liverpool are almost 10 times worse than Man United (44.5)  and Norwich (45.1) in open play crosses/goals category. If there ever was a stat that would (or should) regress to the mean, this is it.

Liverpool had a very talented team in 2011-12. This manifested itself in their high # of completions in the final third where the defensive pressure is highest. Once they are in possession in the final third, they seem to have relied heavily on “crossing the ball” to enable their center-forward Andy  Carroll to take a shot (or head) OR knock it down for their attacking midfielders and wide forwards to take a shot. One big problem was that delivering  crosses is not a very efficient way of passing the ball.  Another problem was they did not seem to have a plan B. It is quite possible that opponents have figured out Liverpool’s crossing strategy and their lack of plan B. The combination of these three factors has contributed significantly to the poor offensive display of Liverpool last season.

Newcastle United

  • 4th – Newcastle is 4thbest in shooting efficiency (goals scored/(shots on target + shots off target)). They stayed 4th even when I included blocked shots in the denominator.
    • This could be the reason why they are an outlier in the final-third completions vs goal scored plot.
    • Manchester City, Arsenal and Manchester United are the top – 3 in shooting efficiency.

Newcastle had two great strikers in Demba Ba and Papisse Cisse who accounted for 29 goals between them. These two were the focus of Newcastle attack and were very efficient with their shots. They did not need a high # of completed passes in the final third to score their goals as they were able to convert a higher % of their shots into goals.

Blackburn Rovers

  • 7thBlackburn are 7th best in shooting efficiency inside the box (goals scored from inside the box/(shots on target inside the box + shots off target inside the box)).
  • Yakubu scored 17 goals for Blackburn and has the 2ndbest  Goals to Shots ratio among all the forwards who have scored than 10 goals.
    • This could be one of the reasons for their big positive differential between actual goals scored (44) and the expected goals scored (24.2).

Summary

# of successful passes in the final third has a strong correlation to goals scored.

Final third is a “high-value” area for scoring goals. More completions in the final third means a team is spending more time in the high-value area. This translates into more opportunities to take a shot or draw errors from defenders to win set pieces from close range, which further increase scoring opportunities.

A high number of completions in the final third alone might not guarantee goals. Liverpool and Newcastle , two examples from the two extremes of the outlier spectrum are cases in point. However, it is one of the key contributing factors to scoring goals. The fact R2 jumped from 0.671 to 0.827 when Liverpool’s data was excluded from the data set strengthens is a case in point.