The English Premier League is one of the world’s most hotly contested footballing competitions, in which 20 teams battle it out to be crowned the champions of England every year. Over the years, a total of 49 different teams have played in the Premier League, attracting a cumulative global audience of 3.2 billion people during the 2018/2019 season.
Having been starved of sporting action during the recent lockdown, we decided to apply our business modelling capabilities to the Premier League in order to demonstrate our approach to Predictive Analytics and Forecasting.
Opti-Num’s team of data scientists used artificial intelligence and deep learning techniques to create a sports prediction model to determine the outcome of the 2019/20 Premier League Season. Already 29 games in, the premier league season has been dominated by Liverpool, leading the rest of the pack by 25 points, with Norwich City, Aston Villa and Bournemouth sitting in the relegation zone. However, the season has been postponed due to the COVID-19 pandemic, but this has not stopped us at Opti-Num from predicting the final league table come the end of the season.
Valuable data has been collected since the inception of the league in 1991. This data is easily available and contains in depth summaries of every game, all which has been preprocessed in the development of a model that predicts the outcome of a match in the Premier League and the number of goals scored by each team in a match, and outputs the scores in a user friendly GUI. The model has been used to predict the outcome of the remainder of the 2019/2020 League season using Deep Learning, but first before we take a look at the results, we must take a look at the models workflow and methods used to get an output that looks like this:
The matches work in a successive manner where the results of one match feed into predicting the next, this is commonly known as pre-game form. The Opti-Num data scientists therefore needed to use a method that speaks to this, with the solution using Recurrent neural networks (RNN), which sees the current match in context with those that came before it, to better predict what might happen next. The only deterrent to this is that RNNs have short-term persistent memory, meaning only the recent results are relevant and any older results or trends aren’t. Long Short-Term Memory networks (LSTMs) further expand the framework of the RNN by incorporating long-term dependencies in the learning capabilities by remembering information for long periods.
Given the unpredictable nature of this type of data, our team at Opti-Num designed a model to consider the many tunable parameters. Three separate models were created, one to predict which team would win the match and another to predict draws collectively. Using the match results, the third model would predict the final score for the match. The accuracy of our models are defined by how many times they can correctly predict the outcomes in historic data, which is broken up into two parts, the final results of the game (win, draw or lose) and the final score of the match based on the predicted outcome (number of goals for the home and away team). Four season’s worth of data were used in validating and testing our model to quantify the success of the predicted results, with these data sets coming from the most recent four seasons (2015/16 to 2018/19). This led to the final model workflow as seen below.
There are a variety of features that have a direct effect on the final score of the match such as the number of shots, shots on target for the last 10 games and the results from the match prediction model. As many sources would tell you, having a wider array of data that goes further back in time to train would lead to more accurate results as well as having more time to fine tune the models. The final step before seeing the predicted outcome of the 2019/20 Premier League season, is to see the accuracy of the model when compared to historical data. The model has a final accuracy of 62% for the match prediction (win, draw or loss), outperforming similar sports prediction models with accuracy ranges between 47% and 59% (1, 2, 3). Similarly, the final score prediction model fared well with a final accuracy of 15% compared to the 12% benchmark set by the literature (4). All the predictions are then displayed in an easy to use dashboard, which can be consulted to determine the final standings of the 2019/20 season.
As everyone expected, the model predicted that Liverpool would continue their dominance and finish the league with a record breaking 109 points, going undefeated in the remaining fixtures. The top four are rounded off with Manchester City, Chelsea and Leicester qualifying for the UEFA Champions League and Tottenham Hotspurs (who also go on 9 game unbeaten run to end the season) qualifying by virtue of a lower goal difference to Leicester, for the UEFA Europa League. On the other end of the table Aston Villa escape relegation, and are replaced by Brighton Hove & Albion, who go the rest of the season without a goal or a win. They are joined by the struggling Bournemouth and Norwich City in a drop to Championship Football next season (England’s second division). The biggest climbers from the current standing of the Premier League table are Tottenham Hotspurs, moving up three positions, while Sheffield United, Burnley and Brighton Hove & Albion are predicted to drop by three positions each.
Our Advanced Business Analytics team at Opti-Num Solutions can take your big data and forecast your business’ performance using a variety of advanced algorithms, to help drive decision making and unveil key business insights in an easy to use interface.
What Can I Do Next?
- Request a trial.
- Visit our Advanced Business Analytics Competency Page.
- Find out more from the team.