Thinking About Modelling Football

What I Think About When I Think About Football.

Dec 31, 2024

I’ve been sharing my hobby of building team ratings models for several years now. One of the benefits of learning in public is receiving plenty of feedback and questions. Almost every week, I’m asked whether my model “accounts for X” or if I have advice for those looking to get started.

One of the biggest ‘myths’ implicit in a lot of these interactions is that data-driven models are entirely objective. Models will always to a certain extent reflect the belief of their creators. This is true mechanically. Models are shaped by countless approximations, assumptions, and decisions made during creation. These decisions are central to the core question of ‘what wins football matches?’ and venture into smaller questions such as:

How to account for red cards?
What about predicting/rewarding penalties?
How much does ‘finishing skill’ matter?
Do own goals ‘count’?
What about manager influence?
Do we count incorrect decisions (e.g. offside goal given)?
How to account for new transfers?

These are just a handful of many examples. Beginners in modelling football often ask me directly for the "answer" to questions like these. To me, at least, these are very difficult questions. There is no clear answer and dealing with them is (currently) more of an art than a science - and that art is yours. Over time, this art can evolve into a science as you gain insights into which assumptions are flawed and refine your approach. But make no mistake: this art is yours. You own it.

Having this much personal responsibility in analysing football is quite the paradigm shift. For the minority it can be quite liberating but for the majority it brings on paralysis by analysis. So people are still keen to hear what my (or my model’s) core beliefs are about what wins football matches. I am not going to talk about all the levers and gears but I will talk about how my fundamental ideas/beliefs work into a ‘model’… but this does not exclusively apply to statistical models - the mental model of football in your head is just as valid. So here is my attempt to talk a bit about the core belief I have developed through this hobby in the hope it helps at least one person.

In my head, I keep coming back to the following ‘formula’:

Match Outcome = Luck + Player Quality + Decision Quality

Luck

Luck plays a massive part in influencing the outcomes of football matches as there is a lot of unexpected and hidden information: the bounce of the ball, deflections, referee errors, injuries, and even the weather. It is a very uncomfortable truth to come to terms with but denying this reality is a total non-starter if you are interested in understanding and predicting football matches. Trying to work out whether a team played well or got lucky is a fundamental issue. Football is not like Chess where there is no hidden information and the signal is there for all to see.

From a modelling perspective, we recognise this part of the formula exists but do our best to square it away: separate the signal from the noise. Making a determination for yourself as to what aspects of football are skill and what parts are luck is a very good place to start when building your model assumptions. Indeed a lot of the later adjustments you make to your analysis might be moving the needle in the opposite direction on things you previously thought came from skill or luck.

Player Quality

The influence of managers versus players on match outcomes remains one of the most debated topics in football analytics. There’s already plenty of material available on this subject. A good place to start would be this 2016 edited extract from Soccernomics or with this article from The Athletic. Even with what is already out there, I still believe this is the most valuable area one could dedicate research time.

Where your beliefs fall on this spectrum has big repercussions for how you might model football:

It can change your fundamental starting point: is a ‘team rating’ the combination of 11+ different individual ratings or is a team only the Team as a collective?
It can affect your team ratings: how do you adjust for player changes in the off-season? This has often been dealt with by using wage bills as a proxy, relying on the assumption that good players cost more money.
It can affect your match predictions: you may wish to adjust predicted outcomes based on available personnel (you see this effect in the change in market predictions when news gets out that a key player is missing).

I am convinced that players are the significantly more decisive factor in match outcomes and that managers merely play a peripheral role. Managers can do their best but when the 90 minutes begins it is player quality and player decision quality that ultimately influences outcome. The players play the game.

I see evidence of this in nearly every competitive domain. There is virtually no game where the ‘units’ of play cannot outweigh those using them. You could beat five-time World Chess Champion Magnus Carlsen if all your pieces were queens. You could beat professional poker players if your hand was always rigged. I believe the same is true in football. Make a starting 11 of the world’s best players and I think they win the English Championship.

Decision Quality

This becomes more personal from here. I understand that introducing ‘Decision Quality’ as a core element of what wins football matches may lose some readers, but to me, it’s the most critical aspect. Decision Quality (alongside player quality and luck) is what I fundamentally believe determines success in football. I believe this in my bones.

The starting point is the following fundamental truth: any increase in Decision Quality improves the outcomes associated with such decisions. Therefore, what I am looking for in a football match is a team that actively attempts to improve their Decision Quality. I think a team that constantly seeks to make high-quality decisions is a good proxy for signal that the match outcome will go in their favour. I consider the best foundation for (improving) Decision Quality is control.

I broadly define control as the ability of a team to influence the sequence of play. The definition is deliberately abstract to avoid the trap of thinking control is binary. People often think of control in football as being associated with wanting and having the ball. Whilst the ball is indeed the most important thing in football - true control is being able to control everything of value (everything that influences the sequence of play). This means having control of space (the pitch) and time (the 90 minute game constraints) - both of which do not necessarily require the ball.

This understanding allows us to envision the ‘classical’ control team with high possession and field tilt (control focused on the ball) but also allows us to envision the team with the fastest wingers in the league who exhibit their control of the game by playing a low-block to lure the opposition into being countered (control focused on space and time). Both of these teams are making intentional choices and using control to increase Decision Quality and improve their associated match outcomes.

One of the reasons I consider Decision Quality to be so important is because it is the only true ‘antidote’ to dealing with the luck part of the formula. Football is a noisy endeavour and if you want to improve as a team you need to be able to separate the signal from the noise. The feedback loop in football is so murky: 90 minutes every week is not enough data to know how things are going. This minimal amount of data is even less useful if it is laced with chaos. Control is the best foundation to bring clarity to this process. In a way this might be why Mikel Arteta’s Arsenal are my favourite football team ever - I cannot help but admire the valiant attempt to ‘out-football football’ through their efforts to minimise variance at all cost.

I also feel some teams in the Premier League ‘act like they believe this to be true’. We are seeing newly promoted teams such as Southampton 24/25 and Burnley 23/24 still try to play with a ‘classical’ dedication to controlling the game. They try to do this despite the low chance of succeeding (player quality part of formula is too small) because the payoff can be huge: a chance to engage with the clearer feedback loop and ‘make the jump’.

Click to enlarge. Style profiles of Southampton, Leicester, and Ipswich. View interactive versions here.

Southampton are a good example of mistaking control for just having the ball. Note the differences in Possession, Progression and Passing, and Intricate Attack above. From a modelling perspective, I don’t just automatically reward control. It needs intention and Decision Quality. Nottingham Forest this season is a good example of the broader definition of control I speak of. They do not control the ball but they intentionally control space and time, from which the few unique possession sequences they get from this are filled with high-quality decisions. Do not make the mistake of thinking Nottingham Forest are not controlling the game.

I place focus on ‘sequences of play’ rather than ‘shots’. Shots and Expected Goals have (deservedly) received a lot of positive attention in analysis. Scoring goals is how you win football matches, and almost all goals are the result of a shot. However, we can’t only be interested in shots. Placing a value on possessions is not new - a good place to start would be with this short explainer from Statsbomb on Expected Threat and Possession Value models.

From a match analysis perspective, I find it more interesting to see it through the lens of: how a team controls the sequences of play and makes high-quality decisions to put themselves in a position that increases goal probability (this is silent on whether a shot was taken). The distinction is a decision-value lens rather than a pitch/possession lens. To draw a comparison with poker, making the decision when to fold is just as interesting as making the decision to play the hand. Folding (or not taking a shot) or playing the hand (taking a shot) can both increase/decrease your chance of winning. All of this also goes to my general belief that shots and finishing skill have become overrated (an article for another time perhaps).

There is immense pressure to get decisions right because sequences are valuable: you only get one try at any given sequence of play. As an example, let us take a fairly common Premier League game with the following scoreline:

Team A (1.5 xG) 1 - 1 (1.5 xG) Team B.

Team A had a Player Y who took five shots outside of the box that all missed. I believe that Team A would have a good chance to win the game if Player Y had not wasted those sequences of play and instead looked to turn them into something more valuable (improving Decision Quality). When I first had this thought my intuition scolded me: “obviously, if you turn five 0.02 xG shots into five possessions of higher xG then they will be more likely to win… that’s just maths silly!”

But I think I am saying something deeper than this. The evaluation of a ‘chance’ as high-value should not be limited to the shot taken but should instead be an overall evaluation of the entire sequence of play and the extent to Decision Quality in that sequence of play increased goal probability. At every stage of the sequence players are deciding between alternate futures and every shot Player Y took committed Team A to that course of action and eliminated alternatives. I have to accept that all things being equal (player quality + luck), a team can get better or worse solely through their decisions.

This means I have to sit with the belief that, perhaps, Team A can be more likely to win if Player Y instead turned his five 0.02 xG shots into five alternative sequences even if they did not end in a shot (0 xG). It feels counterintuitive when faced with the new xG totals in this scenario (1.42 - 1.5) but there is potential opportunity in what is foregone.

Summary

The deeper I have got into this hobby over the past few years the more it reaffirmed by general view that you want to interrogate your own philosophy towards something as much as you can before creating that thing.

‘Does your model account for red cards?’ is a great question that you should be asking when analysing football. But before the ‘does?’ is the ‘why?’ and ‘should?’.

I can answer the ‘does’ part for you in relation to my model but the ‘why’ and ‘should’ are with you. For some questions there are numbers out there that can basically answer all of the ‘should’ for you (but it is much more cool if this comes from the process described earlier of adjusting your assumptions).

Ultimately, you first have to work out what you really believe and this will guide your analysis. What is your actual belief about how to account for red cards? If it is not obvious if you should reward/punish this then you need to follow more granular questions that help answer that, e.g. do teams actually score more against fewer men even when considering teams get more defensive?

To summarise: there is a lot more vibing in modelling football matches than you might think. This stuff isn’t ‘solved’. People start out looking for the answers (like I did) but are frustrated and demotivated when they cannot find them. I hope you take the opposite view and are encouraged by this. Find the right questions and then try to find the right answers.

Thanks for reading. In a way, this piece was written as what I wish I could have read 5 years ago when I started. I hope this resonates with at least one person and proves useful.

Wishing you a Happy New Year.

Please do share this with people who you think might benefit from this and I would love to hear any thoughts you have on this.

Reuben Anderson

Jan 1

As a Spurs fan, I couldn’t agree more. Ange wants to control games through relentless attack. It works well when we have our first XI (player quality) and the players are fresh (they make better decisions). And it falls apart otherwise as it’s not a very effective means of control. A team like Ipswich controlled the match away by making a low block, reserving their energy, and then counter attacking fast up the wings. Angeball is dependent on the extreme energy levels to recover and press from chaotically distributed player positions. His frustration though has been the final third decision making. We lose the ball easily, create a lot of low quality chances, and a few big chances.. if we don’t convert them we’re constantly at risk of dropping a goal or two.

Expand full comment

Alex Stewart

Very interesting, and much to agree with

1 more comment...

elevenify

Discussion about this post

Ready for more?