Having worked in and around the sports data industry for virtually my whole working life, I have found myself having more conversations than ever recently about data collection and the subsequent use of it from both a professional football and sports betting perspective.
I read a fascinating thread on Punters Lounge last week. The topic was around what statistics are used by traders when analysing matches and teams. There was a significant lack of sophistication among the type of websites used by those in the thread, and that was completely understandable. Access to data in European football is sadly nothing like it is in the major North American sports. Go to any of the main league websites (NHL, NBA, MLB, MLS and NFL), and you’ll find incredible amounts of data sat there waiting to be consumed and most notably for modelling purposes, that has been available historically. That is not quite the same in football, although thanks to the likes of Opta, that has become more accessible in recent years.
There are however two fundamentals lacking in my opinion when it comes to data collection and I genuinely believe the exact same principles apply in professional football as it does in betting.
QUALITY OVER QUANTITY
Finding the signal from the noise in 10,000 data events per match is challenging enough. Even if you have the data dumped on your doorstep, finding the events and statistics that actually mean something is not easy. Defensive actions are notoriously difficult to locate any true insight from. Some of the best defenders in the world may actually have lower statistics in tackling and clearances than many others purely because their positioning, anticipation and communication may be superior to the rest….but who collects data or quantifies those skills? Rio Ferdinand in his prime was a great example of that. Similarly some of the defensive statistics out there are somewhat misleading. I cringe when reading about a defender’s 100% successful clearance rate in a game. A successful clearance is typically considered to be clearing the danger and defenders frequently hit 100%. It is the one clearance in a game that may be unsuccessful and a defensive mistake that actually matters rather than the 35 successful ones.
One challenge is in quantifying the ‘quality’ aspect in football. Take a shot on target. That could be a shot trickling to the goalkeeper from 30 yards posing virtually no threat to the goalkeeper and a minimal percentage chance of a goal being scored. Compare it to a shot off target, which for many data providers includes a shot hitting the woodwork with the keeper potentially being nowhere near the ball. A shot that hits the corner flag can also be categorised in the ‘Shot off target’ stats. There is a huge difference between the attempt that hits the woodwork and the other two efforts in terms of how close a team came to scoring, yet there is little way to decipher the quality of each attempt purely from the event perspective.
Take it a step further and analyse a more granular set of data, and there are still holes. You may have the positional X/Y coordinates of where the shot was taken from, yet without the position of the players around the attempt, little can be taken from that. A shot from the 18-yard box with a congested penalty area is very different to a shot when a player is clean through on goal with no other players to beat other than the goalkeeper. Similarly, there is no measure for how much time a player may have to take the shot or how much pressure a defender is applying.
This is where subjective data comes into play. Many data providers are understandably petrified at applying subjectivity when it comes to data collection. Essentially, you require all of the data collectors and analysts to think the same and apply the same standards consistently across all matches. I can assure you it is no easy task, and the training required is relatively extreme. If it is done properly however, the insight gained from such data is invaluable, and is where I think provides the edge whether it is for sports traders wanting to understand the true value of a player or a team, or a football club trying to identify a potential new signing, evaluating their own players or analysing the opposition.
With goals being so minimal in football compared to goals/points in many other sports, the next step to understanding the quality of a team is in the chances created or conceded. Rating the quality of goalscoring opportunities and categorising them can work. Analysing situations and trying to understand the probability of a goal being scored if that chance occurred a million times, can help set benchmarks to apply. A 25%-50% (a good chance which may test the keeper but not fully stretch them) or 50% or higher (a great chance which may be missing an open goal, a fingertip save or a player clean through on goal) likelihood of a goal being scored for example, would provide another valuable metric on which to measure team and player performance as the next step down from a goal. It would give far more insight to the traditional Shots On or Off Target as Chris Smith hinted in his excellent TSR blog. Another aspect to consider is when such ‘good’ or ‘great’ chances are created in a game. I would weight a great chance higher at 0-0 than I would for a team losing 5-0 in the 88th minute. Understanding the true or fair value of a team’s ability is fundamental to finding value in the market and such quality-based, insight-driven data can provide this.
Another core fundamental which is lacking, and is hard to quantify, is the psychological factor among players and teams. It is exactly the same principle – if you could predict a team’s motivation coming into a game as a trader, you would likely have a far more accurate starting number to trade off. If you are a Football Manager (we’ve all played the game or at least spoken like one in the pub with friends), if you could identify players that had higher motivation levels or concentration levels for a goalkeeper for example, your success rate in the transfer market and possibly in team selection would likely be better.
Once March comes around, the motivation factor can be hugely significant in some matches. It’s the time of the season when it becomes clearer who will be fighting for the league title, who is within range of Europe or promotion, and who is in danger of being relegated. It is also a time when teams sat in mid-table with little to play for can have questionable motivation and mixed displays. Anybody who can accurately predict which Newcastle will turn up this weekend deserves a medal.
Being aware of trends and studying situational examples of teams can be hugely beneficial in-play. Confidence and morale is another vital factor in this sense. There is a reason why Fulham have a -40 goal difference this season. 16 of their 21 defeats in the league have been by two goals or more. Their fight and motivation when ahead or level has been reasonable. When they have fallen behind, their heads have notably dropped frequently this season, and that was evident against Man City last weekend. They are not the kind of team you would want to be betting the unders on in-play in certain scenarios for that reason.
Analysing such trends and performance can make a difference. Psychologists may frown on this, but even applying simple ratings to teams’ motivation and confidence levels at various match scenarios can go some way to predicting future performance. It will not eliminate the unpredictability of a Newcastle, but it will help find teams who show character. West Brom have drawn the most matches (13) in the Premier League this season. The last seven draws has seen West Brom losing at some point in the game, only for them to fight back and claim a point with equalising goals. They may be languishing in 16th place and in a relegation battle, but I would have more confidence in predicting how they will perform than a Newcastle, or a Fulham going behind in a game. West Brom would rank relatively high in my motivation ratings for Premier League teams.
Finding value in the market is difficult. Bookies have a firm grasp on pricing now, and even if the early market is out of line, the limits are often so small that they quickly fall into line, and the bet size deters the big players from showing their hands early on. What may appear to be a mispricing on the day before the game up until kick off, is now more likely to be a case of opposing the educated traders with vast research resources behind them. 7-10 years ago, there was plenty of money swilling around, and a lack of efficiency that it was much easier to find the value, and for it to be genuine value.
This is as hard a time as ever to be a successful trader without major resources behind you. It is far from impossible however. Finding strategies and being creative is key to being successful. Create your own ratings, use subjective data to rank teams, or try quantify notoriously difficult elements such as motivation, and then back test them. There are very few short cuts nowadays but do not be put off by the challenge!