Follow-up analysis: Final third passing and Goals scored per game


This is a follow up to my post regarding the strong correlation between completed in the final third and goals scored.

Question

Is there a correlation between the final third completions & goals scored at the game level?

Analysis

I investigated to see if this correlation exists at the game level using the #MCFCAnalytics data set. I plotted the completions in the final third vs. goals scored for Manchester City in all their 38 games of English Premier League.
Blue = Away
; Orange = Home

Manchester City Goals vs. Pass completions in the final 3rd on a per game basis

Findings:

  • Linear regression had an R2 of 0.04  implying that there is no correlation between passes completed in the final third and goals scored at the game level.
    I did the plot for a few other teams and got similar results.
     
  • Arsenal – Away and Liverpool – Home. In both cases, Manchester City had very little success completing passes in the final 3rd. However, they lost 1-0 at the Emirates and won 3-0 at home vs. Liverpool.
    Against Liverpool, City had 6 shots on target and 2 off target.
    Against Arsenal, City had 0 shots on target and 3 off target.
  • QPR – Home and QPR – Away. City scored 3 goals each against QPR home and away. However, they had a season high 326 completed passes in the final 3rd at home vs. just 74 in the away fixture.
    Shots vs. QPR Away – 5 on target & 10 off target.
    Shots vs. QPR Home – 15 on target and 10 off target.

The City – QPR fixture was that crazy season finale. City fell behind and they threw everyone forward to go for the win and the Premier league title. QPR was a man down from 55th minute and they defended at the edge of their 18-yard box for most of 2nd half. This explains the unusually high number of completed passes in the final third.

The above examples underline the rarity of the “goal” event. In any given game, there could be factors like bad shooting, luck, the opponent’s goalkeeper having a great game etc., which could influence the # of goals scored. However, over a season those things seem to even out.

In the next step of analysis I will add a 2nd variable to the model and analyze.

Advertisements

MCFC Analytics – summary of blog posts # 1


Here are some of analysis pieces based on the #MCFCAnalytics data published in the past couple of weeks. I plan to do a weekly post linking to all the articles I come across on Twitter.

My objectives for the weekly post are:

  1. Capture all the work done using the MCFC data set
  2. Provide a forum of discussion – you are welcome to use the comments section of this blog to discuss these posts.
  3. Get to know more people working on the data set and learn from the knowledge and ideas of others.

@MarkTaylor0 goes in-depth into the short passing ability of the keeper . Data shows that keepers had the best average short pass completion rate in the 2011-12 EPL season. However, that stat is not telling the full story.

Mark also has another interesting post on how Stoke City commit more fouls than Arsenal but end the season with the same # of yellow cards

Looks like Mark is working on more posts. I will check back next week to see what is new on Mark’s blog.

@MarchiMax  has a couple of posts

Passing in EPL post looks of the delta between the average pass attempts of a team and the average pass attempts they allow their opponents. Some of the outliers like Fulham, Newcastle and Swansea are interesting.

The Passing efficiency is more interesting. It looks at the passing completion % of all the teams and attempts to identify factors that affect the passing completion %. Factors like opponent, stadium as well as the zone of the pitch in which the pass is completed.

@Philby1976 has interactive dashboard of the whole data set. This is a great tool to visualize different metrics and data points. The site’s response time might vary based on the speed of your internet and the browser you are using.

Data viz and tableau expert @acotgreave has couple of early examples of what can be done with the data set using Tableau Public. There are 3 different examples  A) # of players used by the teams  B) How did the players of two teams fare against each in the previous meetings and C) Comparing  two strikers . The post not only highlights the data set but also the different visualizations you can do using  Tableau Public.

@DanJHarrington has another great on visualization using Tableau  – How does a team pass the ball . The post visualizes how each player in a team passes the ball (# of forward, backward & sideways passes). There is another post from the same website on  who should have been England’s # 1 GK at the Euros using the MCFC data.

@JimmyCoverdale  has a post on how data gives enough evidence on Why Walcott should be moved to a central midfield role

Apart from these I have a couple of posts on this blogs so far. Correlation between goals scored and pass completions in the final third and an opposition analysis of QPR. 

If I missed any, please add them in the comments section, I will cover them in my next weekly post.

MCFC Analytics data – The story so far


I have been playing with #MCFCAnalytics data set for the past 4-5 days. I have been having a lot of fun with the data.

One of the key reasons is “ease of use”.

The data is provided in the very simple Comma Separated Values (CSV) format.  CSV is one of the simplest data formats where each column of data is separated by a comma. You may open this file in Notepad, excel or any text editor. I have worked a lot with data football or otherwise. I end up spending the majority of my time in getting the data into correct format. I was pleasantly surprised to find the MCFCAnalytics data in CSV format. I opened the file in excel, created a pivot table and I was on my way.

If you have ever used excel before, using this data set is very easy. Unzip and open the file in excel. In excel you can start playing with the data using a Pivot table . A pivot table helps you slice and dice the data the way you want.
For example :- if you want to see the # of goals conceded by Manchester City in each of it is 19 away games, you could do that with 3-4 clicks in the pivot table.

If you are comfortable with excel, for visualizations you may you use charts in excel or try Tableau Public. Tableau Public is free. It supports the CSV format. Tableau provides much slicker visualizations than excel charts but you might need a few days to get ramped up on it.

While the Lite data set doesn’t capture every event in every game, it is an exhaustive list of almost every stat about teams and players aggregated at a game level for all the 380 games of the 2011-12 EPL season. Playing with the Lite data set helps you get an idea of the metrics and KPIs available for analyzing performance. It leads you to more questions and in a way prepares you for working with a more extensive and complex data set.

For example :- I always wondered if there was a relation between final third passing and goals scored .I did some analysis and found out that there is a strong correlation between passing in the final third and the goals scored. Now I want to dig further and find out if there is a particular zone within the final third that has a stronger correlation to the goals scored. I need the (X,Y) data associated with each pass to figure it out.

That brings us to the most important aspect of data analysis. Before taking a deep-dive into the data, always ask yourself:
What is the question you want to answer using the data.

Without a question in mind you are bound to get lost or lose interest in the data very quickly. The question could be anything – from “Who took the most shots in EPL in 2012” to “is there a correlation between wins and shots taken” etc. In my example above if I never ventured to answer my first question, I could have never gotten to the second question.

I thank Gavin Fleig and Manchester City Football Club, Simon Farrant and Opta Sports Pro for starting this great initiative.

I have seen some interesting work produced by a number of people already.  I hope to see a lot more in the coming days and weeks.

Manchester City vs. QPR : Opposition analysis #CityOppostion #MCFCAnalytics


This is an “Opposition analysis” of QPR, City’s opponent on Saturday 1st September at the Etihad Stadium. I used the #MCFCAnalytics Lite data set to do this analysis.

Picture courtesy : @srands_analyst on twitter

QPR – Offense

Goals scored 16th
Headed goals 10 – 4th in the League 24.4% of their goals are from headers
Poor shooting efficiency from outside the box 3rd in # of shots taken from outside box but 15th in shooting efficiency (goals scored/{shots on target + shots off target from outside the box}
Long pass efficiency 7th
Final 3rd passing 13th in final third completions

 

QPR – Key attacking players

Goals Jamie Mackie (8 goals  at 26.7% shooting efficiency) and Djibril Cissé (6 goals at 31.6% shooting efficiency) were the most dangerous  goal scoring threats.
Shots 56 – Adil Taraabt took the highest # of shots in QPR

50/56 shots are from outside the box

Taraabt also had 32 of his shots blocked, 27 of them from outside the box

Assists Wright-Philips, Traore, Taraabt and Barton were the top assist providers with 3 each.
Final Third passing Joey Barton (435) had the maximum completions in the final 3rd. They have a great replacement for him in Esteban Granero, who is much better than Barton technically but he might need a few games to find his gears in the Premier League

Taraabt (322), Faurlin (288) and Wright-Philips (215) are the next 3 in this category. All with an passing completion rate of over 70%.

Other interesting aspects Taraabt (90 – 42%), Wright-Philips (96 – 37.5%) and Mackie (91 – 27.5%) are the top dribblers of the team.

 

QPR – Offensive summary

QPR seem to be very direct in their attack. They tend to defend deep and hit on the counter. They scored 10 goals from headers. Adil Taraabt is a very dynamic player but his decision-making is questionable. He takes too many shots from outside the box, many of them either off-target or blocked. Their average of less than 1 goal per away game highlights their trouble scoring away from home.

Joey Barton was one of the key cogs of their attack last season. He will be replaced by the excellent Esteban Granero, a product of the Real Madrid youth system.

The key players for QPR on the attack are Mackie, Wright-Philips and Taraabt.
Granero will be a part of this list as he gets used to the Premier League

Esteban Granero is technically much better and has none of the disciplinary issues of Barton. Granero is very adept at running the game from the midfield and has great technique and touch. His best seasons were at Getafe (on loan from Real Madrid) when he played a key role in taking the the small club from the suburbs of Madrid to within inches of the semi-finals of the UEFA cup 2007-08. He moved back to Real Madrid in 2009 and have not had a lot of playing opportunities since then. He must be eager to have a go at QPR and I expect him to have similar impact at QPR as the other Spanish midfielders are having at their respective Premier league teams. However, I doubt he will have a big impact in the game at Etihad Stadium.

QPR – Defence

Goals conceded 3rd highest in the league
Shots conceded 4th
Corners conceded 2nd
Clearances 2ndAlso 2nd highest headed clearances and highest proportion of headed clearances  among total clearances
Ground duels wining % 2nd
Aerial duels winning % 16th
Tackles winning % 17th
Red-cards 9 – 1st in the league

 

QPR – Defensive summary

“A train-wreck waiting to happen” – Is how I would describe the QPR defense of last season in 5 words. They seem to defend deep and it is likely that their back four is slow. Opponents  complete about 10% more passes in QPR’s defensive third on an average compared to their league average. The # of corners conceded and headed clearances tell me that the QPR defence is in a “hurried” mode when the opposition is in QPR’s defensive third.  This means they are a fraction too slow to be in the right place at the right time. They are forced to make clearances with no time to think about placement. They are ranked 17th in tackles won. Some of it is probably due to them being fraction late on the tackles.

QPR – Goalkeeping

Goal keeper metrics Standing among the peers
Goals conceded overall 17th in the league
GK distribution efficiency  – Kenny

(Successful GK distribution/Total GK distribution)

60% – 13th out of 18 GKs with 29 or more starts
Short passes completion – Kenny 80% – 15th out of 18 (league average 90%)
Long passes completion – 39% – 11th out of 18  (league average 39%)
Proportion of Long to short passes – Kenny 90% – 3rd out of 18 (league average 76%)

 

QPR – Goalkeeping summary

Patrick Kenny is not with QPR anymore. Robert Green was not  any better in the first two games. They have signed the veteran Brazilian keeper Julio Caesar a few days ago. He is an upgrade over Green. However, I am not too sure if their GK distribution strategy would change much. I think that is the key problem – Too much emphasis on long balls and very poor completions rates even with the short passes.

City should enjoy a lot of success if they try to pressure and hurry the QPR keeper.

City vs. QPR Head – to – head 2011-12

  1. 2 of the 4 goals were headers – a strength of QPR
  2. All 4 goals from inside the box, 1 from a set-play and  3 from open play
  3. One of the goals was a quick counterattack
  4. Scorers : Cissé, Mackie, Boothroyd, Helguson
  • How did City score vs. QPR?

 

  1. All 6 from inside the box
  2. 2 were headers
  3. 5 from open play and 1 from a corner
  4. Scorers : Aguero, Dzeko x 2, Yaya Touré, Zabaleta, Silva

Final word

City should win this game. QPR defence had too many issues last season and  based on first two games of the Premiership I am not convinced that they have addressed them. On the other hand, City has a potent offence despite the absence of Aguero. However, QPR did score twice at the Etihad in that crazy season finale. If City defence can keep a tab on Mackie, Wright-Philips and Taraabt, QPR’s chances of scoring would go down dramatically.

Passing in the final third and goals – EPL 2011-12 #MCFCAnalytics


Question:

Is there a correlation between passing in the final third and the goals scored?

I used the #MCFCAnalytics data set to find the answer.

Analysis

Plot of  Total # of completed passes in the final vs. Goals scored for all the 20 teams in the 2011-12 season of the Barclays Premier League

 Findings:

  • Linear regression had an R2 of 0.671indicating a strong correlation between passes completed in the final third and goals scored.
    Excluding the outlier of Liverpool from the dataset the R2jumped to 0.827.
  • Liverpool is ranked 3rd in the # of passes completed in the final third. However, they are only ranked 15th in goal scored.
  • 75.73– Liverpool’s expected goals scored based on the above regression. However, they managed to score only 42 goals.
    • What is the reason for the huge negative difference?
  • Swansea’s case is interesting. You may remember the term “Swansealona” was one of the favorites with EPL analysts and reporters last season due to their reputation for passing style and high amounts of possession. However, they are below the league average on passes completed in the final third.
  • Newcastle  is ranked 18th in passes completed in the final third. However, Newcastle is ranked 7th in goal scored.Expected goals scored for Newcastle is 29.6. They managed to score 51!
  • Blackburn is ranked last in passes completed in the final third. However, Blackburn scored a lot more goals (44) than their expected goals scored (24.2)
  • Stoke is at the bottom – Lowest # of goals scored and 2nd lowest # of passes completed in the final third.  Not surprising based on their style of play.

Liverpool

I hypothesized that

  1. Liverpool might be crossing a lot and
  2. Most crosses occur in the final third. (I would love to look at (X,Y) data to establish this fact.)
  3. Poor shot quality (which might or might be related to their propensity to cross)

Findings:

  • 1103 – Liverpool attempted the highest # of crosses +corners of all teams in 2011-12
  • 840 –  Liverpool attempted the highest # of open play crosses in 2011-12
  • 19th in overall crossing efficiency  (#of successful crosses+corners/# of successful  + # of unsuccessful crosses+corners)
  • 14th in open play crossing efficiency (# of successful open play crosses/# of successful + # of unsuccessful open play crosses)
  • 18th in overall shooting efficiency ( shots on target/shots on target + shots off target + blocked shots)
  • 15thin shooting efficiency not including blocked shots (shots on target/shots on target + shots off target)

    A glance at the top 10 open play crossers of Liverpool in 2011-12.

Player

Attempts

Efficiency

Downing

148

0.209

José Enrique

138

0.210

Henderson

72

0.125

Adam

70

0.157

Gerrard

69

0.203

Bellamy

67

0.194

Johnson

65

0.185

Kuyt

57

0.246

Suárez

47

0.149

Kelly

38

0.105

Liverpool Average

0.192

League Average

0.202

  • 2 – According this article on EPLIndex, Liverpool scored just 2 goals from 840 open play crosses all season. That is 1 goal per every 420 open play crosses.
  • 79 – The average # open play crosses per goal scored in the 2011-12 season. Liverpool are almost 10 times worse than Man United (44.5)  and Norwich (45.1) in open play crosses/goals category. If there ever was a stat that would (or should) regress to the mean, this is it.

Liverpool had a very talented team in 2011-12. This manifested itself in their high # of completions in the final third where the defensive pressure is highest. Once they are in possession in the final third, they seem to have relied heavily on “crossing the ball” to enable their center-forward Andy  Carroll to take a shot (or head) OR knock it down for their attacking midfielders and wide forwards to take a shot. One big problem was that delivering  crosses is not a very efficient way of passing the ball.  Another problem was they did not seem to have a plan B. It is quite possible that opponents have figured out Liverpool’s crossing strategy and their lack of plan B. The combination of these three factors has contributed significantly to the poor offensive display of Liverpool last season.

Newcastle United

  • 4th – Newcastle is 4thbest in shooting efficiency (goals scored/(shots on target + shots off target)). They stayed 4th even when I included blocked shots in the denominator.
    • This could be the reason why they are an outlier in the final-third completions vs goal scored plot.
    • Manchester City, Arsenal and Manchester United are the top – 3 in shooting efficiency.

Newcastle had two great strikers in Demba Ba and Papisse Cisse who accounted for 29 goals between them. These two were the focus of Newcastle attack and were very efficient with their shots. They did not need a high # of completed passes in the final third to score their goals as they were able to convert a higher % of their shots into goals.

Blackburn Rovers

  • 7thBlackburn are 7th best in shooting efficiency inside the box (goals scored from inside the box/(shots on target inside the box + shots off target inside the box)).
  • Yakubu scored 17 goals for Blackburn and has the 2ndbest  Goals to Shots ratio among all the forwards who have scored than 10 goals.
    • This could be one of the reasons for their big positive differential between actual goals scored (44) and the expected goals scored (24.2).

Summary

# of successful passes in the final third has a strong correlation to goals scored.

Final third is a “high-value” area for scoring goals. More completions in the final third means a team is spending more time in the high-value area. This translates into more opportunities to take a shot or draw errors from defenders to win set pieces from close range, which further increase scoring opportunities.

A high number of completions in the final third alone might not guarantee goals. Liverpool and Newcastle , two examples from the two extremes of the outlier spectrum are cases in point. However, it is one of the key contributing factors to scoring goals. The fact R2 jumped from 0.671 to 0.827 when Liverpool’s data was excluded from the data set strengthens is a case in point.

All future posts on Onfooty.com


Some of you might know this already. All my future posts will be published on On Football.

The objectives remain unchanged. A visual and a data-driven view of all things football.

Here are my first two posts on Onfooty.com

Agents in Football – Focus on EPL

Putting Manchester City’s spending into perspective

 

Follow us on Twitter at @AnalyseFooty and Sarah on @Onfooty

 

Messi and Ronaldo – Clasico Special


Lionel Messi and Cristiano Ronaldo are arguably the two best players of this generation. They go head-to-head for the 6th time this season. The intent of this post is to celebrate these two players on the eve of a Clasico.

While both Messi and Ronaldo have scored 41 goals a piece so far, they following charts prove that they have each done it in their own distinct ways.

41 goals in the season so far. 5 games still to be played. Can they both reach the 50 goal mark? Definitely on the cards.

All number are La Liga numbers.

Goals

Assists 

Key passes

Key passes are defined as passes that would have been potential assists with good finishing.

Goals to shots

Shots to shots on target

Data Credits

– @JavierJotah

– Some data from Marca.com and some data from Whoscored.com

“Manchester United do ‘it’ to teams every year” Really?


It was a few weeks ago. Sunday EPL games just ended and Manchester United had opened up a 5-point lead over local rivals Manchester City.

Fans, journalists, some TV announcers and even some stats geeks on my Twitter timeline seemed to be saying the same thing.

“Manchester United do it every year to their rivals around this time of the year.”

“It” means a surge to win the title coming from behind. I got curious. Even after assuming that “doing it every year” is probably an exaggeration for “majority of the time” I couldn’t quite believe it. I looked at some data.

Hypothesis:

We tend to remember events better than numbers. Some events are more memorable than the others.
I hypothesized that this might be a case of selective memory due to the dramatic nature a few events like this comeback of Man United against Bayern München in 1999.

I analyzed the Premier League tables from the inaugural season in 92-93 through 2011-12.

As I had expected, the data painted a different picture.

Methodology:

1. Look at the top-4 of the standings for every season at the end of the months January, February, March, April and May(end of the season).
2. Plot the points differential between the leader and the rest.
3. Look at the # of times the lead changed hands in the seasons that Manchester United won the title (from the end of January to May)
4. Look at seasons where Manchester United lead early on but did not go on to win the title.

Assumptions:

1. Ignore teams below 4th place, to reduce noise. I have also ignored the 4th place in 2003-04 where there were 4 different teams that were 4th at the end each month, I ignored them to reduce noise.
2. Plotted only point totals at the end of the last 5 months, to reduce noise – Deeper analysis (on a week-to-week basis) in seasons with close title run-ins  will be done as a follow up.

Observations:

1. Manchester United is a great champion and won 12 of 19 titles, but in quite a few cases they had comfortable leads from end of January through May (see images below)

Season Lead changes after January
1993-94 0 – Led from week #4
1996-97 0 – Led from week #23
1999-00 0 – Led from week #21
2000-01 0 – Led from week#10
2006-07 0 – Led from week #7
2008-09 0 – Led from week #20
2010-11 0 – Lead from week #15

2. They were 5 seasons where they made a title push coming from behind to win the title.

  • 1992-93In the inaugural Premier League season they took the lead in Mid-March and led the rest of the way to a title
  • 95-96: Newcastle lead the table into mid-March but United overtook them and went on to win the title

  • 98-99:Closest run-in of all. They lead Jan through Mid-March, gave up the lead briefly to Arsenal but pipped them at the end by a point. If you want to talk about late comebacks, this has got to be the poster child, although they did wobble a bit towards the end.
  • 02-03:Came back from 3rd at end of January to overtake Arsenal in mid-March
  • 07-08: Interesting chart. Were level on points with Arsenal at the end of January. Took over the lead from Arsenal in Feb. Chelsea chased them down in April but Man United prevailed by 2 points in the end. Not a major comeback in my book just playing cool with the lead.
  • 11-12: Jury is still out on the current season

3. They lost 4 titles after leading the table post January

  • 97-98: United lead till from January through mid-April where the lost the lead to Arsenal. Arsenal’s rise is slightly exaggerated in the chart as the had 3 games in hand at the end of February.
  • 01-02: The bottom literally fell-off for United in late March
  • 03-04: After leading at the end of January, United were never in it. The “Invincibles” season of Arsenal. I dont have a 4th place team in this because there 4 different teams in 4th and the graph was getting busy.

  • 09-10: Great title race with Chelsea, but Man United came up short by a point in this one.

Other seasons

  • 04-05: They were never in contention

  • 05-06: They were never in contention
  • 94-95: Blackburn prevailed despite United running them close

Conclusion:

It is clear that they don’t come from behind always, not even close.

  • There were 5 instances where they came from behind between end of January & May to go on to win the title. (Not counting the current season)
  • There were 4 instances where they lost a lead between end of January & May to go on to lose the title.
  • There were 7 instances where they led end-to-end between end of January & March

In all there were 12 seasons in which they fell behind at some point between end of January and May.

Titles won after trailing 5/12 = 41.66%

Titles lost after leading 4/12 = 33.33%

Never lead 3/12 = 25%

All data from http://www.premierleague.com
You may follow me on twitter @AnalyseFooty & @aupasubmarino

Charts for seasons where they led from end of January through May

  • 93-94

  • 96-97

  • 99-00

  • 00-01

  • 06-07

  • 08-09

  • 10-11

First Post : Yet another blog crunching the football numbers


The world needed just one more football datanik and I have decided to answer the call.

Seriously though, I love data.
I love designing visual interfaces to represent data.
I deal with a bit of both in my day job.

And above all, I love football.

This blog is an attempt at mixing all three and see what shakes. It could turn-out to be an #EPICFAIL or a “meh” …or something useful and substantial. We (me and the 2 bots/crawlers that might stumble upon this site) will find it out in a few months.

My whole-hearted thanks to Saurabh who did an amazing job of designing the logo and the layout. I know from experience that simple designs are the hardest ones to design. 

%d bloggers like this: