• Keine Ergebnisse gefunden

The use of HMMs to predict play calls in the NFL indicates that the accuracy of the predictions is increased — compared to similar previous studies — by accounting for the time series structure of the data. We split the data into a training set (seasons 2009–2017) and a test set (season 2018), and fitted HMMs to the (training) data of all teams individually, which yields 71.5% correctly predicted out-of-sample play calls.

The prediction accuracy for the individual teams range from 60.2% to 77.9%, with the highest prediction accuracy obtained for the New England Patriots (see Figure 9.5).

Practitioners have to take into account the variation in the prediction accuracy across teams and plays. For example, if a pass is predicted for the Los Angeles Rams, it is fairly likely that the actual play will indeed be a pass (according to our model), since the corresponding precision is obtained as 90%. On the other hand, if a pass is predicted for the Seattle Seahawks, this forecast has to be treated with caution, as the precision is obtained as 55.9%. Additional aspects for practitioners are the costs of an incorrect decision. For example, if teams want to avoid that a pass is anticipated although the actual play of the opponent’s offense is a run, then coaches

9.5 Discussion 141

963

944 933

977

888

959 876

894

982 953

843

915 905

933 981

916 941

905

890 941

922 896

961 942

950 863 873

926 958

825 964 881

0.0 0.2 0.4 0.6 0.8

NE PIT NYG RAMS DET DAL MIN DEN NYJ NO ARI CHAR CLE TB IND ATL WAS OAK JAX CIN SF HOU TEN BUF PHI CAR GB KC CHI MIA BAL SEA

team

prediction accuracy

Figure 9.5: Prediction accuracy for the individual teams. The number of out-of-sample observations (i.e. of pre-dicted plays) is shown at the top of the bars.

should carefully consider the corresponding precision rates. Since the models presented here provide probabilistic forecasts and not only binary classifications, coaches could consult the forecasts only if the predicted probability exceeds a chosen threshold. In any case, practitioners should not regard these models as a tool which delivers defense adjustments for each play automatically, but rather as an additional help to make better defense and offense plays, respectively.

Further research could focus on including additional covariates to improve the pre-dictive power, such as the personnel of the team, i.e. the information on how many running backs/fullbacks, tight ends and wide receiver are on the field. In addition, the current strength of the team is not captured yet. This could be quantified by, for in-stance, the player ratings provided by the video game Madden, which was also done by Lee et al. (2017) and Joash Fernandes et al. (2020). However, it is at least ques-tionable whether information on players can indeed be used on the field in practice, since players are substituted fairly frequently during a match. Finally, updating the

142 Predicting play calls in the National Football League using hidden Markov models

model throughout the 2018 season dynamically, rather than using the model fitted up to season 2018 in the out-of-sample prediction would further improve the predictive power.

10 Summary and outlook

This chapter summarises the main results of Chapters 2 – 9 and provides discussions on further research. Chapters 2 – 4 cover studies on betting markets, including deter-minants of betting volumes, market inefficiencies, and fraud detection. Section 10.1 summarises the main findings of these chapters and provides an outlook for further research. Chapters 5 – 9 cover several analyses on the evaluation of in-game per-formance. Section 10.2 summarises the main findings of these chapters and provides additional points for further research in this field.

10.1 Betting fraud detection

Several match fixing incidents occurred in the past decade, leading to an increased demand for fraud detection systems. Whereas existing fraud detection systems focus primarily on odds movements, this thesis argues that both betting volumes and betting odds should be monitored. For the former, statistical models have to account for the complex patterns present in the data, which include heteroscedasticity and non-linear effects. In Chapter 2 we use the flexible class of GAMLSS to explicitly account for these patterns. As a case study, we analyse betting volumes of the English Premier League. The results suggest that the matchday, the weekday, the strength of the teams (quantified by both teams’ market values), and the uncertainty of outcome affect both the mean and the standard deviation of betting volumes. Compared to a classical linear model, the GAMLSS improves the model fit substantially in terms of the AIC. When using these models for examining betting fraud, the concept of market inefficiencies becomes important, as extreme betting volumes may arise due to inefficiencies and fraud, respectively. This renders fraud detection by analysing betting volumes difficult in the presence of market inefficiencies. The results presented in Chapter 3 suggest that inefficiencies exist in German betting markets. These inefficiencies occur at the beginning of a season, as bookmakers have only little information on the strength of recently promoted teams at that time. However, with more information on teams’

144 Summary and outlook

strength becoming available during the season, the inefficiencies disappear. For the analysis of match fixing in Chapter 4, the model formulation developed in Chapter 2 is used to detect fixed matches by identifying outliers in betting volumes. In addition to the betting volumes, betting odds are predicted using a GAMLSS with bivariate Poisson response. Considering data from the Italian Serie B, we achieve a true positive rate of about 75% for the detection of fixed matches, with at the same time only 34% false positives. These results suggest that monitoring betting volumes and betting odds can lead to more reliable detection of fixed matches.

The field of betting markets provides several points for further research. The inves-tigation of in-game betting markets constitutes a natural extension of the pre-game analysis as presented in Chapters 2 – 4. When analysing the determinants of in-game betting volumes, the modelling framework has to account for the time series structure of the data, as in-game volumes are sampled at high frequencies. For the analysis of inefficiencies in in-game betting markets, betting odds could be investigated after cer-tain events such as red cards and scored goals. However, as for the analysis of in-game betting volumes, statistical models accounting for the time series structure are also re-quired when investigating in-game betting odds. Using both in-game betting odds and volumes, extending the fraud detection approach presented in Chapter 4 to in-game betting constitutes a further point for future research. Such fraud detection systems for in-game betting are highly relevant, as nowadays about 70% of the total betting volume is generated by in-game bets (Forrest and McHale, 2019).