# SuperBru Predictions with AI

## Congrats to the Boks!

For the recent Rugby World Cup a few members of the Opti-Num team participated in a sports prediction game using the SuperBru platform. Points were awarded in the Rugby World Cup for:

• Predicting the correct winning team
• Predicting the correct margin of winning

Many of us used our rugby knowledge, patriotism or luck to play the game but two Opti-Num Consultants decided to have some fun and used their MATLAB knowledge to create Artificial Intelligence (AI) algorithms to predict the games. There was also an external AI algorithm (RuggerBot) that played in our pool. Here is how the top of our pool ended up:

 Position Player Total Points AI used 1 Bot1 58.83 Yes – built by Opti-Num Consultant 2 Person1 58.00 No 3 Person2 57.50 No 4 Person3 53.00 No 5 Bot2 51.50 Yes – built by Opti-Num Consultant 6 RuggerBot 50.50 Yes – built externally

Both of our internal AI models fared better than the external RuggerBot with Bot1 ending up in the top 0.4% of all South African players. It should be noted that both Bot1 and Bot2 stuck to their strategy of using algorithms only for predictions and predicted England to win the final while the rest of us went with the Boks which resulted in a much closer top 6.

We asked the designers of Bot1 and Bot2 to share how they built their table-topping, AI algorithms by answering some questions.

Here are their responses:

Bot1

Where did you source your data?

Wikipedia Rankings and then Superbru results as they came in.

In brief, how does your AI algorithm work

The win predictor was done by calculating which team had the higher-ranking score. The margin predictor was done by subtracting team’s ranking scores from each other to create a ranking-difference.  The ranking-difference was then multiplied by a factor. The factor was initially a guessed value (2) but after the results started to come in, I calculated it. The calculation was done by solving a system of linear equations using the result-difference vector and the ranking-difference vector.

How long did it take to build your algorithm?

Initially it took me about 30 minutes. It also took a few minutes copying results every week.

What improvements have you/would you like to make?

I later added graphs to see how things were performing and if there were any obvious enhancements, I could make e.g. fitting curves, that would improve predictions. As it turned out, my strategy was initially pretty good.

I think from the quarter finals onwards a different strategy might be better. The ranking scores are small from that point so even the win can go either way. At that point rugby knowledge probably fared better than an algorithm.

Bot2

Where did you source your data?

World rankings and match data such as wins, points difference, wins over opposition sourced from http://stats.espnscrum.com/

In brief, how does your AI algorithm work

The win predictor consisted of 10 Classification ensembles (with optimised hyperparameters) using boosted trees which implemented an ADABoost algorithm and the predicted winner was the team that was predicted most often by all 10 models. The margin predictor consisted of 30 Regression ensembles (with optimised hyperparameters): 10 were regression boosted trees with ADABoost, 10 were shallow neural networks, 10 were regression SVMs with a 2nd order polynomial kernel. The results of these 30 regression models were averaged to find the predicted margin.

How long did it take to build your algorithm?

The models and code took a few hours to put together (training consumed most of the time).

What improvements have you/would you like to make?

More varied predictor data! I don’t watch rugby regularly, so I took an unbiased approached to the modelling and predictors. Ideally, I would have liked to bring in more current team specific information as I took match data over the past 10 years. My data contained teams that changed drastically as players rotate and retire which changes the team composition. Current statistics such as try and conversation rates as well as penalty frequencies would be useful for the margin prediction which is where my models “missed the posts”.