MCFC Analytics – Summary of blog posts #4


It has been about a month since the basic MCFC data set has been released and it is great to see lots of people churning out stuff using both the basic and advanced data sets.

Based on the tweets with #MCFCAnalytics tag, there are quite a few peoples’ projects are in progress. Good luck to all of you. Make sure you share your project/blog links with the hashtag.

Some people are looking for partners and contributors to the projects they are working on. If you are interested, please keep a tab on the #MCFCAnalytics tab and get in touch with folks directly.

Analysis posts

  1. @MarkTaylor0Analyzing the passes by comparing them to their expected pass completion rates using passes of James Milner in Bolton Vs. Manchester City from 2011-12 season.
  2. Mark also has post on how Man City and Bolton passed the ball
  3. @JdewittHow goals are scored in EPL
  4.  @ChrisJLilleyAnalyzing center-backs of the premier league
  5. @analysefooty (this blog!)Opposition analysis of Arsenal

Visualization posts

  1. @DanJHarrington – a very interesting visualizations of passes using Vector diagrams in Tableau Public
  2. @MarchiMax – a visualization of where the ball is a few seconds before a shot is taken
  3. @OngoalsscoredVisualization of the goalscorer’s body parts. Very neat!

If I missed any please post your links in the comments section.

Links to previous summaries

Summary #1

Summary #2

Summary #3

Feel free to tweet me or email me if you want to chat with me on something specific!

Advertisement

Arsenal – Opposition Analysis


This is an “Opposition analysis” of Arsenal, City’s opponent on Sunday 23rd September at the Etihad Stadium. I used the #MCFCAnalytics Lite data set to do this analysis

Arsenal – Offense

Open play goals – bread & butter

Goals scored

% of Open play goals

Shots on Target

Shots on Target inside the box

Shot efficiency

Goals/shots On + off Target

Overall

Outside the box

Inside the box

Assists per Goals scored

 

3rd in Aggregate, from inside the box and from open play

1st

3rd

1st

 

 

3rd

7th

5th

5th

 

Strong from inside the box

1st in # of shots on target

Weak from outside the box

16th in % of goals from outside the box

Passing

Final 3rd  completions / comp %

Short passescompletions / comp %

Long passes completions / comp %

Long balls completions / comp %

 

3rd / 4th

1st / 6th

16th / 7th

20th / 19th

 

Other

2nd – open play touches in the opposition’s 18yard box

18th in open play crossing efficiency

  • successful open play crosses/successful + unsuccessful open play crosses

Importance of 1st goal

Scored the first goal 23 times – 3rd in EPL

Record when scoring first 16 W – 3 D – 4 L

Record when not scoring first 5 W – 4 D – 6 L

Arsenal – Key attacking players

Goals Van Persie – 30Walcott – 8Arteta & Vermaelen – 6 each

Shots On Target

Efficiency

Van Persie – 82, Walcott – 34, Ramsey – 18, Gervinho – 17

Van Persie – 21.2%, Arteta & Verlmaelen – 23%, Walcott – 13.7%

 

Assists

Song – 11, Van Persie – 9, Walcott – 8, Gervinho – 6
Final 3rd passing

Completions

Completion %

 

Arteta – 617, Ramsey – 502, Rosicky– 501

Arteta – 85.7%, Gervinho – 80.9%, Sagna– 80.1%

Other interesting aspects Immediate impact of Santi Cazorla, Lucas Podolski and Olivier Giroud

Arsenal – Offensive summary

Personnel changes

RVP was colossal for Arsenal last season with 30 goals. The 2nd highest goal-scorer for Arsenal was Theo Walcott with 8. The Dutchman is not with club anymore. He is replaced by the three-headed monster, Podolski – Giroud – Cazorla.

At first sight it might seem like an RVP-less Arsenal would be a lot easier to defend. It might even be true for the first handful of games of the season. However, once Giroud, Podolski and Santi Cazorla are in-sync with each other and with Arsene Wenger’s scheme, they will be a much harder team to defend.

As Arsene Wenger pointed out after the 6-1 win over Southampton, when you have someone like RVP who scored 30 goals, the opposition knows who will get the ball. Arsenal have added variety to their attack with Giroud, Podolski and Cazorla upfront. All three can shoot, score, assist and work to create space for the others.

While Giroud has not scored yet, his movement has been intelligent and has been unlucky on occasion. Santi Cazorla has slotted in seamlessly at Arsenal (and in the EPL) and much of the same for Lucas Podolski. Cazorla leads EPL in completions in the final third and already has a goal and 2 assists. Podolski has 2 goals and an assist.

What the 2011-12 numbers say

Based on last year’s numbers Arsenal attack is primarily based on short passing and taking high percentage shots from close range. They are 1st in short passes completed and 1st in shots on target from inside the box. Arsenal also gets majority of their goals from open-play. They are 2nd in touches inside opponents’ 18-yard box. Arsenal also have a high assist to goal ratio. Arsenal are bottom of the table in long balls and are 16th in long pass completions. They also do not cross particularly well.

All this put together: Arsenal pass, pass and pass some more until they get inside the area. Once inside the area they try to pass again before taking a high percentage shot (or miss the shooting opportunity).

They were average to mediocre at converting corners and set pieces, although that might change with the arrival of Steve Bould as Wenger’s deputy. Steve is known for his preparation and tactical work on the set pieces. We have already seen some of it this season with Cazorla making some signs holding up the ball before taking corners. Both Cazorla and Lucas Podolski are very good free-kick takers and Cazorla has a powerful outside shot. He led La Liga last season with 5 goals from outside the box (including direct free kicks).

Santi Cazorla – Genius : Photo Courtesy – Guardian

I have written a piece about Santi Cazorla’s impact on a football team a few weeks ago. He has already had a big impact at Arsenal. Not only does he add bite to the attack upfront, his arrival also allows Arteta to play much deeper in the central midfield, which seems to suit him better. This also allows Arsenal to quickly transition to their defensive shape when not in possession. Cazorla (and Podolski) both track back to defend when they lose the ball. Something that RVP was not very good at.

To slow the Arsenal offense, City needs to find a way to minimize the impact of Cazorla and Podolski. Arsenal is a bit weak at fullbacks due to the absence of right back Bacary Sagna. Carl Jenkinson is playing in his place and has looked suspect. They do not attack much on the right, as Jenkinson stays conservative for the most part. Gibbs on the left side has been much more adventurous. If you do a heat map of Arsenal attacks so far this season, I will not be surprised if it is skewed to the left.

To slowdown Cazorla will not be easy. During his time at Villarreal, teams like Barça would push their fullbacks up and force Cazorla to defend the full back, thus pushing him deep and further away from the high-value areas.

Arsenal – Defence

Goals conceded

49 – 8th lowest

Touches in final 3rd allowed

Lowest in EPL

Shots Conceded

3rd lowest3rd lowest– From inside the box2nd lowest – From outside the box

Tackles

1st in last man tackles

Clearances

2nd lowest in all clearances & headed clearances

Blocks

Lowest

Arsenal – Defensive summary

After their early season funk and the 8-2 loss at the Old Trafford Arsenal have defended really well last season. They allowed the lowest # of touches in the final 3rd and the 3rd lowest # of shots in the league.

Arsenal are also 1st in last man tackles with 25 (12 more than the 2nd best). This implies that they most likely defended with a high backline and tried to recover possession as quickly as possible. Since they keep the ball a lot, it reduces the touches for the opposition in Arsenal’s defensive third. The last man tackles were by center-backs to cut out the through balls.(Koscielny – 9, Vermaelen – 5 & Mertesacker – 3). With such a defensive scheme, it is not surprising that Arsenal forced the highest # of offsides and have let in 4th highest # of through balls. Arsenal defence also has the lowest # of blocks and 2nd lowest # of clearances. They defending far away from their area, so there is a less need for clearances.

This season, so far has been a slightly different story. Arsenal are defending deeper (opinion based on watching games) and more compactly (2 lines of 4 very close to each other). There is more emphasis on defending set pieces and corners. This could all be due to Steve Bould but could also be due to the absence of Bacary Sagna or probably a bit of both. They have conceded just once so far (on what seemed like gaffe by Szczesny).

By defending deeper Arsenal might concede a lot more corners, crosses and throw-ins close to the area but it also reduces their giving up breakaway attacks and through ball opportunities.

Arsenal – Goalkeeping – Wojciech Szczesny

Goals conceded overall

49 – tied for 11th lowest

Saves

Lowest in EPL

Clean sheets

13 – tied for 5th most

GK distribution efficiency(Successful GK distribution/Total GK distribution)

2nd best

Long passes completion

39%

Short passes completion rate

95.5% – 3rd best

Proportion of Long to short passes

51-49

Arsenal – Goalkeeping Summary

Szczesny is one of the best young goalkeepers in the league prone to the occasional error (like last week vs. Southampton). He is one of the best short passer and 2nd best distribution. He also has one of the most balanced long passes to short passes ratio at 51:49. This stat underlines further the Arsenal philosophy of short passes.

He did concede a lot of goals (49) but a lot of it is down to Arsenal’s defensive scheme. They used a high backline, which means when the opposition forwards beat the high line, they were more likely to have a favourable match-up in terms of numbers and a clear sight of the goal. Szczesny’s league lowest # of total saves could very likely be a side effect of the overall defensive scheme.

City vs. Arsenal Head – to – head 2011-12

  • City won at home 1 – 0 and Arsenal won at home 1 – 0
  • City missed Yaya Toure in the game at Emirates and failed to register a shot on target for the only time all season.
  • City also had a season low 53 successful passes in the final third in the game at Emirates (season average : 135)
  • Even at the game in Etihad City only managed 105 successful passes.
  • Importance of 1st goal – Both teams have impressive records when scoring first, especially City
    • City’s record when scoring first is 25 Wins 2 Draws and 1 Loss
    • Arsenal’s record when scoring first is 16 Wins 3 Draws and 4 Losses

Final word

Last season Arsenal gave Manchester City two of its toughest games of the season. They did not allow City to enjoy the possession dominance in the final 3rd they are used vs. rest of the teams in the EPL. The games were very close. Small details and moments of individual brilliance (or an error) determined the results.

To win, City needs:

  • to limit the influence of Cazorla and Podolski.
  • Take advantage of one of the few weaknesses of Arsenal, the fullbacks – especially on the right side.
  • Minimize Arsenal’s touches in the final 3rd – Arsenal will enjoy a lot of possession due to the nature of their game. However, limiting their possession in the high-value areas will be key to City’s success.
  • Score first – City has an impeccable record of 25W 2D 1L when scoring first
  • David Silva, Yaya Toure, Balotelli and Tevez need to have great games for City. The injury to Samir Nasri at the Bernabeu could be a big blow if it forces him to miss out the Sunday’s clash.

Final 3rd analysis – more follow ups


Thanks a lot for all the feedback and discussion regarding the final third analysis. Here are a few follow-ups on the feedback.

Feedback: The correlation between goals scored and passes in the final third is driven by the top 5 goal scoring clubs. If they are removed from the data set, the correlation might be weak.
This was brought up by @WillTGM & @Chumolo

Follow-up:

  • The correlation is not nearly as strong if the top 5 (goals scored) are removed. However, 5 teams constitute 25% of the sample space. If we cherry pick the top 5, it is not surprising that the correlation becomes much weaker.
  • I did an experiment choosing 15 clubs randomly from the 20. In several such experiments, the correlation was strong and significant. R2 varied between 0.56 and 0.87. The regression was significant. (F-test)
  • On a similar note, if outliers like Liverpool and Newcastle are excluded, the correlation becomes much stronger.

Feedback: Significance of the regression

@rui_xu brought up a great point about the importance of the significance of the regression and how just R2  might not tell the whole story.

Follow-up:
I did the F-test for all the regressions with the following results

  • However, when I did the same analysis using data from all the 380 games of last season (760 samples), the correlation was weak (as observed for the 38 games of Man City) and the regression was significant for the larger sample space.

Please keep the feedback coming!

Follow-up analysis: Final third passing and Goals scored per game


This is a follow up to my post regarding the strong correlation between completed in the final third and goals scored.

Question

Is there a correlation between the final third completions & goals scored at the game level?

Analysis

I investigated to see if this correlation exists at the game level using the #MCFCAnalytics data set. I plotted the completions in the final third vs. goals scored for Manchester City in all their 38 games of English Premier League.
Blue = Away
; Orange = Home

Manchester City Goals vs. Pass completions in the final 3rd on a per game basis

Findings:

  • Linear regression had an R2 of 0.04  implying that there is no correlation between passes completed in the final third and goals scored at the game level.
    I did the plot for a few other teams and got similar results.
     
  • Arsenal – Away and Liverpool – Home. In both cases, Manchester City had very little success completing passes in the final 3rd. However, they lost 1-0 at the Emirates and won 3-0 at home vs. Liverpool.
    Against Liverpool, City had 6 shots on target and 2 off target.
    Against Arsenal, City had 0 shots on target and 3 off target.
  • QPR – Home and QPR – Away. City scored 3 goals each against QPR home and away. However, they had a season high 326 completed passes in the final 3rd at home vs. just 74 in the away fixture.
    Shots vs. QPR Away – 5 on target & 10 off target.
    Shots vs. QPR Home – 15 on target and 10 off target.

The City – QPR fixture was that crazy season finale. City fell behind and they threw everyone forward to go for the win and the Premier league title. QPR was a man down from 55th minute and they defended at the edge of their 18-yard box for most of 2nd half. This explains the unusually high number of completed passes in the final third.

The above examples underline the rarity of the “goal” event. In any given game, there could be factors like bad shooting, luck, the opponent’s goalkeeper having a great game etc., which could influence the # of goals scored. However, over a season those things seem to even out.

In the next step of analysis I will add a 2nd variable to the model and analyze.

“Manchester United do ‘it’ to teams every year” Really?


It was a few weeks ago. Sunday EPL games just ended and Manchester United had opened up a 5-point lead over local rivals Manchester City.

Fans, journalists, some TV announcers and even some stats geeks on my Twitter timeline seemed to be saying the same thing.

“Manchester United do it every year to their rivals around this time of the year.”

“It” means a surge to win the title coming from behind. I got curious. Even after assuming that “doing it every year” is probably an exaggeration for “majority of the time” I couldn’t quite believe it. I looked at some data.

Hypothesis:

We tend to remember events better than numbers. Some events are more memorable than the others.
I hypothesized that this might be a case of selective memory due to the dramatic nature a few events like this comeback of Man United against Bayern München in 1999.

I analyzed the Premier League tables from the inaugural season in 92-93 through 2011-12.

As I had expected, the data painted a different picture.

Methodology:

1. Look at the top-4 of the standings for every season at the end of the months January, February, March, April and May(end of the season).
2. Plot the points differential between the leader and the rest.
3. Look at the # of times the lead changed hands in the seasons that Manchester United won the title (from the end of January to May)
4. Look at seasons where Manchester United lead early on but did not go on to win the title.

Assumptions:

1. Ignore teams below 4th place, to reduce noise. I have also ignored the 4th place in 2003-04 where there were 4 different teams that were 4th at the end each month, I ignored them to reduce noise.
2. Plotted only point totals at the end of the last 5 months, to reduce noise – Deeper analysis (on a week-to-week basis) in seasons with close title run-ins  will be done as a follow up.

Observations:

1. Manchester United is a great champion and won 12 of 19 titles, but in quite a few cases they had comfortable leads from end of January through May (see images below)

Season Lead changes after January
1993-94 0 – Led from week #4
1996-97 0 – Led from week #23
1999-00 0 – Led from week #21
2000-01 0 – Led from week#10
2006-07 0 – Led from week #7
2008-09 0 – Led from week #20
2010-11 0 – Lead from week #15

2. They were 5 seasons where they made a title push coming from behind to win the title.

  • 1992-93In the inaugural Premier League season they took the lead in Mid-March and led the rest of the way to a title
  • 95-96: Newcastle lead the table into mid-March but United overtook them and went on to win the title

  • 98-99:Closest run-in of all. They lead Jan through Mid-March, gave up the lead briefly to Arsenal but pipped them at the end by a point. If you want to talk about late comebacks, this has got to be the poster child, although they did wobble a bit towards the end.
  • 02-03:Came back from 3rd at end of January to overtake Arsenal in mid-March
  • 07-08: Interesting chart. Were level on points with Arsenal at the end of January. Took over the lead from Arsenal in Feb. Chelsea chased them down in April but Man United prevailed by 2 points in the end. Not a major comeback in my book just playing cool with the lead.
  • 11-12: Jury is still out on the current season

3. They lost 4 titles after leading the table post January

  • 97-98: United lead till from January through mid-April where the lost the lead to Arsenal. Arsenal’s rise is slightly exaggerated in the chart as the had 3 games in hand at the end of February.
  • 01-02: The bottom literally fell-off for United in late March
  • 03-04: After leading at the end of January, United were never in it. The “Invincibles” season of Arsenal. I dont have a 4th place team in this because there 4 different teams in 4th and the graph was getting busy.

  • 09-10: Great title race with Chelsea, but Man United came up short by a point in this one.

Other seasons

  • 04-05: They were never in contention

  • 05-06: They were never in contention
  • 94-95: Blackburn prevailed despite United running them close

Conclusion:

It is clear that they don’t come from behind always, not even close.

  • There were 5 instances where they came from behind between end of January & May to go on to win the title. (Not counting the current season)
  • There were 4 instances where they lost a lead between end of January & May to go on to lose the title.
  • There were 7 instances where they led end-to-end between end of January & March

In all there were 12 seasons in which they fell behind at some point between end of January and May.

Titles won after trailing 5/12 = 41.66%

Titles lost after leading 4/12 = 33.33%

Never lead 3/12 = 25%

All data from http://www.premierleague.com
You may follow me on twitter @AnalyseFooty & @aupasubmarino

Charts for seasons where they led from end of January through May

  • 93-94

  • 96-97

  • 99-00

  • 00-01

  • 06-07

  • 08-09

  • 10-11
%d bloggers like this: