- The growth of expected goals
- What’s in the model and what does it tell us?
- Do we need player-specific information?
- How and when to use expected goals
The growth of expected goals
Before we begin to look at the limitations of expected goals (xG), let me first explain what it is and how it has become so popular. In short, xG is a measure of chance quality. It tells us how much a shot would be “expected” to result in a goal when compared against a large sample of shots taken from that exact position on the soccer pitch (more details can also be considered but we’ll come on to that later).
Expected goals is usually presented on a scale of 0-1 (0 being a certain miss, and 1 being a certain goal). However, people may also refer to the quality of a goal scoring chance in percentage terms (this is simply the 0-1 xG figure converted into a percentage). A shot with 0.5 xG for example would be shown as a shot with 50% scoring chance.
While xG is great for analysing performance (which is beneficial for a soccer team trying to learn and improve or recruit new talent), it has risen in popularity amongst the wider soccer community because it has proven to be a better predictive tool than other commonly used metrics (including shots, goal difference and even points).
What’s in the model and what does it tell us?
There are plenty of arguments against using xG and sometimes this comes from people that have dedicated time to learning about it, including its inner workings. Unfortunately, these criticisms are often levied by those who believe it is simply an intrusion on the traditions of the sport, and who haven’t taken the time to understand what it means, or how it works.
Although the data you use is a crucial part of any model, how you interpret that data is just as important.
“Soccer is played on a pitch, not a spreadsheet” and “the only number that matters is the score after 90 minutes” are just some examples of what you might expect to hear from the detractors of expected goals. However, many of these people will also say things like “he should have scored from there” or “we were unlucky not to get a result” which is simply a narrative form of something that xG will support (or refute) through the use of data.
While the negative perception of expected goals appears to stem from unfounded and often illogical reasoning, there are also some limitations when it comes to using this metric. If you’re intending to use expected goals to your advantage, particularly if you’re looking to build a predictive model for betting on soccer, then failing to recognise these drawbacks could be very detrimental.
Choose your model wisely
While not necessarily a negative, it is important to acknowledge that there is not one universal expected goals model. There are countless different expected goals models (many of which can be accessed online for free) and each one of these models will have a different set of parameters that result in the final expected goals figure they produce – one model may rate a chance at 0.52 xG, while others have it at 0.47, 0.58, or even greater than 0.60.
Which xG model is “best” is certainly open to interpretation and it’s easy to be fooled into thinking that more is better when it comes to building a model. Going beyond the basics like the location of a shot, by adding the angle in relation to the goal, body part used to take the attempt, the type of assist, the amount of defensive pressure, and the position of the goalkeeper will produce a more refined model, it also opens up the debate to overfitting said model.
While there is plenty of noise (randomness) in soccer and xG can help cut through that noise and provide us with a clearer picture, as expected goals models develop, we do also run the risk of overcomplicating things. Adding too many parameters to a model can result in it simply reflecting the data rather than providing predictive insight.
Do we need player-specific information?
One of the biggest struggles for xG is not what it tells us, it’s what it doesn’t tell us. Expected goals doesn’t account for individual players (or their relative talent). While most models will be built on a database of thousands and thousands of shots, the final output is an average figure and not one specific to the person who is taking the shot.
A lack of specific player information or inputs means that if Harry Kane and Pablo Zabaleta were both in the exact same position and taking the exact shame shot, the chance would be given the same xG figure (even though we know Harry Kane would be more likely to score).
While the person taking the shot is important when analysing both team and individual player performance, so too is who is in goal (another parameter that xG can’t account for). This time, imagine Harry Kane having the same chance as before. He could be shooting against David De Gea or a goalkeeper from the fourth tier of English soccer, but his likelihood of scoring would be rated the same by an expected goals model.
This issue is highlighted by the fact that certain players (like Harry Kane) have consistently outperformed the majority of expected goals models over the last few years. Additionally, some goalkeepers (like De Gea) have helped their teams outperform their expected goals against (xGA) over the same period of time.
How and when to use expected goals
The other “issue” with expected goals is more about its implementation rather than the metric itself.The problem here does not lie with expected goals, it’s more to do with the people using it. In short, xG has little to no use for just one single game and it is most effective over a certain period of time or larger sample of data.
If you’re intending to use expected goals to your advantage, for betting on soccer, then failing to recognise its drawbacks could be very detrimental.
Soccer is an incredibly fluid sport that changes from minute to minute; it could be the score, the number of players left on the pitch, the time left in the match, or what is at stake (this particularly applies in a knockout match). If (although it’s more a case of when) expected goals is used on an individual game basis, the figures can be misleading as they don’t tell the whole story, or give important context.
It’s also important to understand that as useful as expected goals can be, that doesn’t mean it should be used all of the time. In the early part of the season, any data will likely be skewed given that only a handful of games have been played within that current competitive landscape (this is without even considering the level of the opponents that have been played against).
The sixth game of a domestic season has often been lauded as a crucial point of a campaign. This is arguably the point when the dust begins to settle in terms of the data, and there is less noise to deal with. After around six games into a season, xG comes into its own and it is at this point and beyond that it is most useful.
Just as expected goals can be used too early in a season, there are people that may argue it can be used too late (or at least too much too late). Of course, we need long enough for trends to develop but at the same time, sometimes teams may underperform or overperform against expectation for a particular reason and there comes a time when we need to look beyond the data (rather than wait for regression).
There is no magic number, but particularly in the final third of a typical domestic campaign it is advised to start looking at actual performance in more detail (though still using xG to gain some additional insight).
This isn’t to say that expected goals isn’t useful over a longer period of time. Rolling xG is a great way to assess a team’s improvement or decline, particularly when a new manager or new players are introduced. However, while this data might be useful moving from one season to the next, the propensity for change within a team also means the data can soon become redundant.
Expected goals is great, but it needs to be used correctly
For many people, expected goals is as good as it gets when it comes to performance metrics in soccer. However, every analyst will be aware of the importance of context and how damaging it can be to rely on one particular metric when it comes to statistical analysis – even more so when it’s done one game at a time.
Anyone who is using xG will probably already be aware of its benefits, but they also need to acknowledge the limitations. Although the data you use is a crucial part of any model, how you interpret that data is just as important. Only with a full understanding (this includes both the good and bad) can xG be used to its full potential.