MIT Sloan Sports Analytics Conference 2013: Soccer Analytics Panel


Panelists : Chris Anderson, Albert Larcada, Blake Wooster, Jeff Agoos

The makeup of this year’s panel is very different from that of Soccer Panel at SSAC 2012. Dominated by “club insiders” last year, this year’s panel had a mix of “outsiders” like Chris Anderson, Blake Wooster from Prozone represented the data companies, Albert Larcada from ESPN coming from media and Jeff Agoos, a former player and Technical Director of MLS.

The Ballroom was almost full and this was a bigger ballroom than that of the last year’s conference. (My totally un-scientific method of measuring crowd sizes puts “almost full” > “75% full of last year”)

Marc Stein moderated the panel as was the case last year.

soccerpanel

This year’s panel took a completely different track compared to the panel from last year. Last year it surrounded around how analytics is used, where it is useful and the importance of context and trust. The biggest challenge cited in 2012 was the availability of good data to work with.  This year it centered more around how much analytics is used by managers, metrics, visualizations – use of analytics in the media and trust (a repeat theme from year).

The panel started off with a historic perspective of soccer analytics from Chris. I thought it was a very good beginning that gave the audience an idea of how far we have come. Albert’s examples of using heatmaps of Messi and Ronaldo in the build-up to the “Clasico” in SportsCenter showed how data is being used to tell a story. The challenge in a scenario like SportsCenter is that the announcers need to be able to explain the graphic and tell the story in less than 10 seconds. Choosing the right type of visualization is key. I also liked the idea of “visualization is analytics” quote from Albert. All visualizations are an approximation of the raw data. If done right, they can tell a story not just in media but also inside a soccer club.

The big questions

  1. How much analytics is being used by the club managers?
  2. What are the challenges in making the coaches use analytics?
  3. Metrics – what do we have today?

How much analytics is being used by the club managers?

Chris tried to answered this based on his experience working with clubs as a consultant but the viewpoint of an “insider” (like an analyst at a club) citing examples where analytics has been used by a manager successfully would have been a good counter-balance.  It is always tough balancing act to not reveal confidential details and have a frank discussion but I felt that the club angle was addressed better in last year’s panel. West Ham United’s manager Sam Allardyce talking about how he uses analytics is a good example.

What are the challenges in making the coaches use analytics?

There was some great discussion on this one.  Coaches have a lot at stake and they may not be willing to use something new unless they know (with a high degree of confidence) or trust that it will help them but they won’t know for sure if it will be useful unless hey use it. The classic chicken and egg. A great example was the importance of survival in a promotion-relegation environment and how that pushes managers and front-offices more towards short-term thinking and immediate results.

Daryl Moorey, the GM of Houston Rockets and a co-chair of the SSAC brought up a great point in the “Revenge of the Nerds” panel about how the front office needs to support and persist with analytics. Managers may not trust it right away but if they find it useful over a period of time they will eventually come around.

Blake had a point when he stressed that the onus is on the analyst to communicate the value of analytics to the coach. If the coach doesn’t see value in it, it is probably because the analyst didn’t do a good enough job of conveying the message.

In the talk “Why we don’t understand Luck” by Michael Mauboussin one of my favorite talks of the conference, Michael stressed the importance of evaluating things based on the process (which is very hard to do) rather than the outcomes (which is easy and what we do normally).

I believe that analytics is being used differently in different clubs (hardly surprising). It has a complimentary role as “another tool” in the overall toolkit for success.  The two key themes that resonated across the 2-day conference “communicating the message” and “winning the trust of coaches/decision-makers” are very important.

Metrics – what do we have today?

This is probably one of the most debated aspect of soccer analytics today. What metrics do we have? What is the equivalent of Baseball’s WAR in soccer? There is a lot to gain from knowing the process of how analytics and metrics work in other sports but every sport is unique and presents its own challenges. It is a fact that we haven’t been able to model soccer matches as well we would love to. Albert brought up a great point about how the paucity of scoring on soccer makes it much harder to model than almost every other team sport. We haven’t yet come up with a formula that tell us how to win a game or score a goal. That is because there are many ways to win a soccer game. Different coaches employ different systems. A metric/KPI valued highly in one system might not matter at all in another system. For example, speed on the ball and accurate long-balls might be very important in a counter-attacking system but may not be that important in a short passing system.

 

Charles Reep, probably the first soccer analyst, concluded that most goals were scored from fewer than three passes: therefore he concluded it was important to get the ball quickly forward as soon as possible.

While his statement might still hold true, the “How?” is the key question. The answer is not as straightforward. The quickest way to move the ball is for the goalkeeper to hoof balls upfield and we know that it doesn’t work most of the time.

I believe the availability of spatial XY-data + data from camera tracking systems (installed in most of the stadiums around the world today) in conjunction with video has helped in answering the “How” better. But we still have a long way to go.

As Chris pointed out at the very beginning of the panel – This is not a revolution but an evolution. We are constantly evolving.

A few more of my thoughts in a podcast with Richard Whittall  editor of the The Score Media’s Counter Attack blog

Suggestions for the next year’s panel

  1. I felt that the composition of the 2012 and 2013 panels were at the extremes in terms of “club insiders/outsiders”. A panel with a mix of the two groups would probably be more useful.
  2. Panel formats don’t lend themselves very well to 1-on-1 dialogue and interaction. An online Q & A chat session with the panelists during the conference is a good thing to try. I got similar feedback from a few others who attended the conference.
  3. Have Big Sam on the panel! – Seriously, having a manager who has used analytics and is willing to talk about it would elevate this panel to a whole another level. I am aware that EPL will be in season, but technology gives the option to have someone participate remotely.

I have another post summarizing my views on the other panels I attended coming up in a few days.

Links

Official website Sloan Sports Analytics Conference

Richard Whittall’s thoughts on the Soccer panel

Zach Slaton’s Summary of the SSAC 13

Mitch Lasky’s impressions of the conference

Sports Hack Day Project


I have been busy ever since I started working for the Seattle Sounders about a month ago. It has been great so far. We are less than a month away from the season kick-off. I am very excited, to say the least.

Coming up in a few weeks is the Sloan Sports Analytics Conference in Boston. Last years conference had a profound impact on me. More about that in another post.

This weekend I participated in the 1st ever Sports Hackday in Seattle. The idea of learning something new, meet like-minded people and a chance to avoid the endless Superbowl pregame show were enough motivation to sign-up. The Hackday was very well-organized. Kudos to the organizers and sponsors. We started off Friday night with introductions and forming teams. Our team “Submarino” constituted of Sarah, AdamMatt and I. We had a few ideas going into the Hackday. After a brief brainstorm we decided on looking at the impact of injuries to soccer clubs.

Sunday morning during the integration phase a few hours before the demo

Sunday morning during the integration phase a few hours before the demo

One of the coolest things about Sports Hackday was  that data providers like Sports Data LLC and platform companies like Google, Cloudant, Twilio etc., provided tools that ensured that we spent most of our time implement our idea and not worry about basic infrastructure and plumbing.

We used the Sports Data LLC‘s API to extract the injury information of English Premier League and broke them down based on teams, types of injuries, # of games missed due to these injuries. We built a fully working model of our idea using real data. It helped that we had an awesome team and that we did a very good job of decoupling the Frontend UI pieces and the backend database work which enabled us to work almost parallely. We had our hairy moments during the integration phase with the clock winding down to Noon, Sunday (the deadline for code-complete). However we were able get done most of what we wanted to do.

We did this cool  interactive visualization illustrating the breakdown of injuries in a team by category and the players. The thickness of the arcs depict the # of games they missed due to a particular injury.

We had 3 minutes to demo and it went well, although all of us were a bit nervous and very tired. We won two prizes. “Best data visualization” and the “Best overall data hack of the Hackday”.

Here is a piece on the Sports Hackday on Geekwire.
Local TV King 5‘s coverage of the event

Frankly, I did not expect to win the overall prize. We ended the evening very happy and very very tired.

Visuals

Manchester United had the highest # of player-games missed due to injuries so far this season. The 2nd visual highlights that muscle injuries is a team-wide issues and not just Nani who missed the most time due to muscle injuries.

This poses a new question : Is there something in the training regimen of Manchester United that is causing this? 

Manu Injury Breakdown

manu2

PS: I couldnt get the interactive part working on the blog due to javascript issues, if I ever figure it out, I will update.

Modeling 101


“Essentially, all models are wrong, but some are useful.” – George E.P. Box

from Empirical Model-Building and Response Surfaces (1987) co-authored with Norman R. Draper, p. 424.

Mr. Box’s quote is quite popular and used a lot these days. However, I am not sure if all those who quote it fully understand it. And that includes me.

What I took away from the quote.

A model is an approximation of a process that we don’t fully understand. We model processes to get a deeper understanding of the process itself. To build a model we use what we understand about a process and try to approximate (or sometimes ignore) what we dont know.

All models are wrong because of the inherent approximation. But some could be useful because what we have “approximated (or ignored)” is not important to understand or replicate the process.

“Relative simplicity” is an important virtue of a good model. By “relative simplicity” I mean the simplicity of a model compared to the actual process. For example, a “simple model” of Relativity theory for a physicist could still take me two lifetimes to understand.

Math lends itself very well to represent these models. We test the usefulness of a model against the real process by giving identical inputs and comparing the outputs. This is easy in case of some models and not so easy in case of others. ( like verifying the existence of Higgs-Boson particle using the Large Hadron Collider that cost an estimated $9b to build).

A model is useful if for a range of inputs, the output of model closely matches the real output of the process. (How close? – It is a matter of threshold and the +/- error range the user of the model is comfortable with.). To summarize, an useful model has some predictive power with

The game of football is a process. Over the years a lot of people have tried to improve our understanding of the game by building models based on what they understood with the data available to them. We have been collecting more and more information as we progressed. More information has led to the better understanding of the game and dispelling some of the myths. But it has also helped create “new myths” (or truths until they get proven wrong in due course of time).

There are a lot of models out there for football. Some based on past results, some based on what happens in a game (events like shots, goals, final 3rd passes etc) and so forth. I dont quite agree with all of them. Some I agree with more than the others.

What I look for in a model?

  1. A model should an understanding of the process; to be aware of what and how the process works. It doesn’t have to be comprehensive (because it is a model after all) but capture the essence of it.Example: If I want to build a model to predict the winner of a football game – I want my model to take into consideration how I win a game of football? By scoring more goals than the opponent. How do I score more goals? By taking a lot of (hopefully good) shots and not letting the opponent take good shots, How have the two teams have been doing in the run-up to the game and so forth. I can list 20-30 items and I am sure I still would have missed many things. The goal is to not to account for every factor but to identify the handful of factors that capture the essence or “signal” as Nate Silver calls it in his popular new book.

    I am skeptical of any model that does not understand the process that it is trying to be a model of.

  2. A model should factor in the nature of the underlying data from a process. For example: Based on Chris Anderson‘s analysis, number of goals scored in a game is not normally distributed. Something like that needs to be factored in if the model using goals scored as an input. This is probably more important for models making long term predictions because the inherent characteristics of data tend to manifest over a longer period of time than in a shorter period of time.Example: Probability of a getting 4 heads if you toss a coin 4 times is 6.25%. But if you repeat the experiment 5 times, you might get 4 heads once, twice, thrice, 4 times, 5 times or never. You might have to repeat the experiment thousands of times to see the probability of 4 heads converge to 6.25%.

Building a good model is not trivial and is an iterative process. But if the first version of a model doesn’t address the above, it might be time for a rethink.

MCFC Analytics blogposts Summary #9


In the past week, I found the following posts written using the #MCFCAnalytics data

  1. Some interesting stuff by @PedroAfonso85 building on some previous work  to breakdown the importance of ball possession and some discussion about the oft discussed yet hard to quantify, momentum.
  2. @MarkTaylor0 analyzed Blocked shots to find if blocking shots is a talent.
  3. @hpstats visualized points difference “with/without” a player in  the starting lineup. Also from the same blog is profiling players based on their shooting
  4. @SportsViz has a video with examples of 3D-visualization of passes using the data from Bolton vs. City game

Previous Summaries

Summary #8

Summary #7

Summary #6

Summary #5

Summary #4

Summary #3

Summary #2

Summary #1

MCFC Analytics blogposts – Summary #8


Here is the list of interesting posts I found in the past week

  1. An interesting post on home advantage and how it manifests itself into football stats by @FbPerspectives. The post also has a link to a detailed paper from 2009 on home advantage.
  2. Guardian Data blog has an interactive visualization of the Bolton – City game by @jburnmurdoch. The viz has a pitch map + a radial diagram that captures the pass direction and length.
  3. The man in the yellow shirt – an analysis of the refs by @PedroAfonso85
  4. An interactive visualization of the direction of a player’s passes by @alekseynp . Some of the outliers are very interesting.
  5. Momentum in Bolton – City game. by @SoccerStatistic . This is a different approach from the previous attempts on visualizing momentum using this data set.

I did not publish anything last week, although I did start writing. Hopefully I will publish something later this week.

Previous Summaries

Summary #7

Summary #6

Summary #5

Summary #4

Summary #3

Summary #2

Summary #1

If I missed any, please post them in the comments section or tweet them to me!

MCFC Analytics – blogposts summary #7


I did not see too many new posts in the past week. I didn’t publish any as I was busy with a different project.

  1. An interactive viz of Bolton – Manchester City  match data by @JBurnMurdoch on @GuardianData blog
  2. @HPStats attempts at defining metrics to be able to cluster players based on their style. Here is a good first step on Passing
  3. @shots_on_target made a summary of vital stats regarding goals, shooting accuracy, penalties etc..
  4. Scouting report on Tim Howard by @footballfactman
  5. An interactive visualization of the full dataset by @PhilyB1976 I posted this in one of the first few summary posts but there is additional information on the site. worth revisiting!
  6. An

Previous Summaries

Summary #6

Summary #5

Summary #4

Summary #3

Summary #2

Summary #1

If I missed any, please post them in the comments section or tweet them to me!

MCFC Analytics-Summary of blogposts #6


This week I saw a few more new bloggers getting into the act with the data.

First up, there was this article by @RWhittall of TheScore.com where Richard talked about “soccer data abuse by some bloggers using the MCFC data”. The gist of the article is that some of the bloggers are extrapolating too much with their conclusions based on one year’s worth of data from one league. The other point made in the article is that the output of the majority of  the work in soccer analytics isn’t groundbreaking and it is just adding a data context to what we already knew.

While I see where Richard is coming from, I don’t quite agree either with his assessment of the state of soccer analytics or the “data abuse” bit.

Unquestionably, we haven’t even scratched the surface of what we can do with data in soccer. The majority of the research work in the soccer analytics is carried out in the private domain.  That is because soccer data is not a commodity like it is in other sports like Baseball. The MCFC & Opta project could be a significant step in the direction of making soccer data more accessible to a wider audience,  if it can get enough passionate people interested in the project. However, like in any type of writing in the public domain, there is the good and the not so good. One of the things we discussed with Gavin Fleig, Head of Performance Analysis at Manchester City, Simon Farrant, Marketing coordinator at Opta et al is to build a community that fosters communication, collaboration and open feedback among the members and the readers. This should help everyone get better in some time.

Without further ado, here are links to some interesting work I found in this past week.

@MarkTaylor0 has a comprehensive piece on the state of soccer analytics and where it stands vis-à-vis other sports like NFL and Baseball. – The case for data analysis in football. This is a must read.

Analytics posts

  1. @PedroAfonso85 has a couple posts using the advanced data set
  2. @ChrisJLilley continues with his positional analysis series with Strikers and Central attacking midfielders
  3. @FootballFactman ‘s piece talks about what to look for in goalkeepers of the premier league
  4. @shots_on_target talks about the correlation between points in fantasy football and attacking stats
  5. In my weekly opposition analysis series I analyzed at Sunderland using last season’s data.

Visualization posts

  1. Earlier today I saw Voetstat, a neat blog by @Voetstat_craig which has some visualizations of pass completion + heatmaps. There are multiple posts. I haven’t had a chance to read all of them yet.
  2. @TomBerthon has this visualization of how goals were scored in the Bolton – City game from last season

If I missed any links, post them in the comments section and tweet them with the hashtag #MCFCAnalytics. I will retweet them.

Previous Summaries

Summary #5

Summary #4

Summary #3

Summary #2

Summary #1

Arsenal – Opposition Analysis


This is an “Opposition analysis” of Arsenal, City’s opponent on Sunday 23rd September at the Etihad Stadium. I used the #MCFCAnalytics Lite data set to do this analysis

Arsenal – Offense

Open play goals – bread & butter

Goals scored

% of Open play goals

Shots on Target

Shots on Target inside the box

Shot efficiency

Goals/shots On + off Target

Overall

Outside the box

Inside the box

Assists per Goals scored

 

3rd in Aggregate, from inside the box and from open play

1st

3rd

1st

 

 

3rd

7th

5th

5th

 

Strong from inside the box

1st in # of shots on target

Weak from outside the box

16th in % of goals from outside the box

Passing

Final 3rd  completions / comp %

Short passescompletions / comp %

Long passes completions / comp %

Long balls completions / comp %

 

3rd / 4th

1st / 6th

16th / 7th

20th / 19th

 

Other

2nd – open play touches in the opposition’s 18yard box

18th in open play crossing efficiency

  • successful open play crosses/successful + unsuccessful open play crosses

Importance of 1st goal

Scored the first goal 23 times – 3rd in EPL

Record when scoring first 16 W – 3 D – 4 L

Record when not scoring first 5 W – 4 D – 6 L

Arsenal – Key attacking players

Goals Van Persie – 30Walcott – 8Arteta & Vermaelen – 6 each

Shots On Target

Efficiency

Van Persie – 82, Walcott – 34, Ramsey – 18, Gervinho – 17

Van Persie – 21.2%, Arteta & Verlmaelen – 23%, Walcott – 13.7%

 

Assists

Song – 11, Van Persie – 9, Walcott – 8, Gervinho – 6
Final 3rd passing

Completions

Completion %

 

Arteta – 617, Ramsey – 502, Rosicky– 501

Arteta – 85.7%, Gervinho – 80.9%, Sagna– 80.1%

Other interesting aspects Immediate impact of Santi Cazorla, Lucas Podolski and Olivier Giroud

Arsenal – Offensive summary

Personnel changes

RVP was colossal for Arsenal last season with 30 goals. The 2nd highest goal-scorer for Arsenal was Theo Walcott with 8. The Dutchman is not with club anymore. He is replaced by the three-headed monster, Podolski – Giroud – Cazorla.

At first sight it might seem like an RVP-less Arsenal would be a lot easier to defend. It might even be true for the first handful of games of the season. However, once Giroud, Podolski and Santi Cazorla are in-sync with each other and with Arsene Wenger’s scheme, they will be a much harder team to defend.

As Arsene Wenger pointed out after the 6-1 win over Southampton, when you have someone like RVP who scored 30 goals, the opposition knows who will get the ball. Arsenal have added variety to their attack with Giroud, Podolski and Cazorla upfront. All three can shoot, score, assist and work to create space for the others.

While Giroud has not scored yet, his movement has been intelligent and has been unlucky on occasion. Santi Cazorla has slotted in seamlessly at Arsenal (and in the EPL) and much of the same for Lucas Podolski. Cazorla leads EPL in completions in the final third and already has a goal and 2 assists. Podolski has 2 goals and an assist.

What the 2011-12 numbers say

Based on last year’s numbers Arsenal attack is primarily based on short passing and taking high percentage shots from close range. They are 1st in short passes completed and 1st in shots on target from inside the box. Arsenal also gets majority of their goals from open-play. They are 2nd in touches inside opponents’ 18-yard box. Arsenal also have a high assist to goal ratio. Arsenal are bottom of the table in long balls and are 16th in long pass completions. They also do not cross particularly well.

All this put together: Arsenal pass, pass and pass some more until they get inside the area. Once inside the area they try to pass again before taking a high percentage shot (or miss the shooting opportunity).

They were average to mediocre at converting corners and set pieces, although that might change with the arrival of Steve Bould as Wenger’s deputy. Steve is known for his preparation and tactical work on the set pieces. We have already seen some of it this season with Cazorla making some signs holding up the ball before taking corners. Both Cazorla and Lucas Podolski are very good free-kick takers and Cazorla has a powerful outside shot. He led La Liga last season with 5 goals from outside the box (including direct free kicks).

Santi Cazorla – Genius : Photo Courtesy – Guardian

I have written a piece about Santi Cazorla’s impact on a football team a few weeks ago. He has already had a big impact at Arsenal. Not only does he add bite to the attack upfront, his arrival also allows Arteta to play much deeper in the central midfield, which seems to suit him better. This also allows Arsenal to quickly transition to their defensive shape when not in possession. Cazorla (and Podolski) both track back to defend when they lose the ball. Something that RVP was not very good at.

To slow the Arsenal offense, City needs to find a way to minimize the impact of Cazorla and Podolski. Arsenal is a bit weak at fullbacks due to the absence of right back Bacary Sagna. Carl Jenkinson is playing in his place and has looked suspect. They do not attack much on the right, as Jenkinson stays conservative for the most part. Gibbs on the left side has been much more adventurous. If you do a heat map of Arsenal attacks so far this season, I will not be surprised if it is skewed to the left.

To slowdown Cazorla will not be easy. During his time at Villarreal, teams like Barça would push their fullbacks up and force Cazorla to defend the full back, thus pushing him deep and further away from the high-value areas.

Arsenal – Defence

Goals conceded

49 – 8th lowest

Touches in final 3rd allowed

Lowest in EPL

Shots Conceded

3rd lowest3rd lowest– From inside the box2nd lowest – From outside the box

Tackles

1st in last man tackles

Clearances

2nd lowest in all clearances & headed clearances

Blocks

Lowest

Arsenal – Defensive summary

After their early season funk and the 8-2 loss at the Old Trafford Arsenal have defended really well last season. They allowed the lowest # of touches in the final 3rd and the 3rd lowest # of shots in the league.

Arsenal are also 1st in last man tackles with 25 (12 more than the 2nd best). This implies that they most likely defended with a high backline and tried to recover possession as quickly as possible. Since they keep the ball a lot, it reduces the touches for the opposition in Arsenal’s defensive third. The last man tackles were by center-backs to cut out the through balls.(Koscielny – 9, Vermaelen – 5 & Mertesacker – 3). With such a defensive scheme, it is not surprising that Arsenal forced the highest # of offsides and have let in 4th highest # of through balls. Arsenal defence also has the lowest # of blocks and 2nd lowest # of clearances. They defending far away from their area, so there is a less need for clearances.

This season, so far has been a slightly different story. Arsenal are defending deeper (opinion based on watching games) and more compactly (2 lines of 4 very close to each other). There is more emphasis on defending set pieces and corners. This could all be due to Steve Bould but could also be due to the absence of Bacary Sagna or probably a bit of both. They have conceded just once so far (on what seemed like gaffe by Szczesny).

By defending deeper Arsenal might concede a lot more corners, crosses and throw-ins close to the area but it also reduces their giving up breakaway attacks and through ball opportunities.

Arsenal – Goalkeeping – Wojciech Szczesny

Goals conceded overall

49 – tied for 11th lowest

Saves

Lowest in EPL

Clean sheets

13 – tied for 5th most

GK distribution efficiency(Successful GK distribution/Total GK distribution)

2nd best

Long passes completion

39%

Short passes completion rate

95.5% – 3rd best

Proportion of Long to short passes

51-49

Arsenal – Goalkeeping Summary

Szczesny is one of the best young goalkeepers in the league prone to the occasional error (like last week vs. Southampton). He is one of the best short passer and 2nd best distribution. He also has one of the most balanced long passes to short passes ratio at 51:49. This stat underlines further the Arsenal philosophy of short passes.

He did concede a lot of goals (49) but a lot of it is down to Arsenal’s defensive scheme. They used a high backline, which means when the opposition forwards beat the high line, they were more likely to have a favourable match-up in terms of numbers and a clear sight of the goal. Szczesny’s league lowest # of total saves could very likely be a side effect of the overall defensive scheme.

City vs. Arsenal Head – to – head 2011-12

  • City won at home 1 – 0 and Arsenal won at home 1 – 0
  • City missed Yaya Toure in the game at Emirates and failed to register a shot on target for the only time all season.
  • City also had a season low 53 successful passes in the final third in the game at Emirates (season average : 135)
  • Even at the game in Etihad City only managed 105 successful passes.
  • Importance of 1st goal – Both teams have impressive records when scoring first, especially City
    • City’s record when scoring first is 25 Wins 2 Draws and 1 Loss
    • Arsenal’s record when scoring first is 16 Wins 3 Draws and 4 Losses

Final word

Last season Arsenal gave Manchester City two of its toughest games of the season. They did not allow City to enjoy the possession dominance in the final 3rd they are used vs. rest of the teams in the EPL. The games were very close. Small details and moments of individual brilliance (or an error) determined the results.

To win, City needs:

  • to limit the influence of Cazorla and Podolski.
  • Take advantage of one of the few weaknesses of Arsenal, the fullbacks – especially on the right side.
  • Minimize Arsenal’s touches in the final 3rd – Arsenal will enjoy a lot of possession due to the nature of their game. However, limiting their possession in the high-value areas will be key to City’s success.
  • Score first – City has an impeccable record of 25W 2D 1L when scoring first
  • David Silva, Yaya Toure, Balotelli and Tevez need to have great games for City. The injury to Samir Nasri at the Bernabeu could be a big blow if it forces him to miss out the Sunday’s clash.

MCFC Analytics – Summary of blog posts # 3


Thanks for the amazing response to Summary of blog posts #1 & Summary of blog posts #2

I also want to thank people who have reached out to me via twitter with links to their blogs & posts.

Goalscorer ‘footedness’ by @DavidAHopkins measures the footedness or the foot favoured by Premier League goalscorers.

How do the more successful clubs keep the ball in EPL by @JDewitt talks about how the top teams in EPL keep possession. Also by John is Successful Passing and Winning

A sneak peek of a very interesting carto by @Kennethfield  Charlie Adam’s “passing wheel”

Football Philosophy – Long passes by @Poolq1984 explores the importance of long ball in football.

@We_R_PL has a nice post on how to use the MCFC dataset more efficiently. He also has spreadsheet which has the own goals calculated per team.

@footballfactman has a post on Darron Gibson using a mix of data from MCFC dataset, whoscored and statszone

The always excellent MarkTaylor0 has detailed post Analysing the quality of shots in Bolton – Manchester City game using the advanced dataset.

@ChrisJLilley has 3 posts on his blog using MCFC data

GK positional analysis

Premier league game changers Part I & Part II

@DanJHarrington has cranked up a lot of things using the advanced dataset

1.  an interactive tableau viz to see touches of each player in Bolton -City on the pitch.

2. Passing visualization using D3.js

3. Dan also has some interesting visualization work in progress. There is a cool video in the link showing ball movement.

Network passing diagrams by @DevinPleuler

Bolton – http://t.co/mcRQ0oHU

Man City – http://t.co/6mtGgJQS

Extracting data from XML

There have been some questions regarding this and some folks have come up with solutions

1. If you have MS Excel 2007 or a later version you can open the file in XML. The only issue with is that XML’s are nested and Excel converts this into a very flat format. So you will see multiple rows for the same events. For example: A successful pass has multiple rows indicating the direction, the x,y coordinates of where it is passed to. Read the data spec thoroughly to understand how the data is formatted in the XML. It will help understand the data much better.

2. Code for R users to extract the F-24 XML by @MarchiMax

3. Code snippets from @JBrisson to extract events from the F-24 XML

4. If you are into programming, most languages have XML parsers. A simple search will get you code snippets to start with.
If I missed any links, please let me know via Twitter or comment on the blog post. Always use #MCFCAnalytics tag in twitter so I can pick them up easily!

Visualizing momentum shifts in Bolton vs. Man City


This is an attempt to visualize the momentum shifts in Bolton vs. City with goals scored and substitutions using the #MCFC analytics advanced data tier – I.

I used possession as a proxy for momentum. The game is divided into 5 minute buckets. If a goal is scored in a bucket, the bucket will end at the minute of the goal scored.
E.g.: 0-4 is bucket #1. If a goal is scored in minute 7, then 5-6 is bucket #2. 7-11 is bucket #3 and so on.

Plots

Figure 1 – Overall cumulative possession difference vs. game time in minutes.

CumulativeAll

2. Figure 2 – Cumulative possession difference up until a goal is scored.
E.g.: Say 1-0 is scored in minute 27 and 2-0 is scored in minute 38. The cumulative possession difference is calculated from minute #1 through 27. After 1-0, cumulative possession difference is calculated from minute 28 onwards (the date of minute 1 through 27 is excluded).
This helps to see if there is any noticeable shift in the momentum of the game after a goal.

CumulativeMomentumShifts

Findings

  • Overall City has dominated possession. The cumulative possession delta was always negative (= in favor of City) in Figure 1.
  • When the cumulative possession difference was reset after each goal scored (Figure 2), we see that Bolton tried to take the initiative after City took the lead in Min 26. They really pushed hard after City scored the 0-2, pulling one back within 2 minutes of City’s 2nd goal.
  • Bolton continued to push for an equalizer until City scored the 1-3 right after the half-time.
  • Bolton enjoyed more possession as they searched for a goal and did a substitution at min 60 (an attacking midfielder for a holding midfielder) – it seems like the move paid off as they scored 2-3 in min 62.
  • Bolton continued to push for an equalizer. City subbed out Aguero for Tevez, both attacking players but Tevez is better at playing a deeper role and hold up the ball.
  • As the game progressed Bolton switched D.Pratley for Chris Eagles (probably a shift in attacking style) and City responded with a defensive move by subbing out Dzeko for Adam Johnson
  • Those two moves by City helped them restore control. Towards the end, City subbed out attacking midfielder David Silva for fullback Zabaleta, a defender to secure the result in the dying minutes.

This is a very quick and simple interpretation & visualization of the moment shifts. All feedback is welcome.