Conclusions - Sports statistics in the data age: betting fraud detection and performance evalua

54 Integrating multiple data sources in match-fixing warning systems

the models. For example, in highly liquid markets, betting activity by match fixers has little effect on betting odds, such that odds do not adapt as quickly when high volume bets are placed, and is hence unlikely to end up being classified as suspicious. In less liquid markets the impact of singular bets in terms of a shift in betting odds is typically more substantial. Since we have data only for one betting platform, it is also possible that match fixers place their bets with other bookmakers. In such a case, we do not observe unusual betting volumes at Betfair, but may still find a deviation from fair betting odds at Betfair: if match fixers place heavy bets with other bookmakers, the odds at these bookmakers start to drop, and Betfair follows with lower odds, because otherwise there would be possibilities for arbitrage.

Although we make use of optimal cut-off values, when using the suggested com-bination of betting volume model and odds model, then practical considerations will guide the choice of the optimal cut-off values. The fraud detection system developed by Sportradar takes into account the expertise of journalists and other experts who assess whether there may be unmodelled but genuine factors that could explain suspi-cious odds, e.g. key players being injured, or very one-sided matches where it is likely that key players will be rested. If a fraud detection system does involve such expert elicitation, several false positive matches could be eliminated by taking the opinions of these experts into account. In such a setting, lower cut-off values for all approaches considered may be adequate.

4.4 Conclusions 55

captured by our models, which to some extent is likely due to restrictions of our analysis. First, we focus on pre-game betting only and do not account for in-game betting. Second, we use data from only one betting platform, Betfair, for the analysis of the betting volumes. This restriction is simply a consequence of the lack of additional data. For the betting odds, the limitation to only one betting platform is of no major importance, since betting odds do not vary substantially between betting platforms. In any case, it is remarkable that modelling betting volumes still yields a true positive rate of approximately 75% (yet at the same time 33.96% false positives) when considering only pre-game betting data from Betfair. Furthermore, it is worth noting that some of the false positives as returned by our analysis may in fact be true positives, i.e.

matches that were fixed but (as of yet) are not known to have been fixed.

The main objective of our work was to demonstrate the potential usefulness of extending early-warning systems to also incorporate information on betting volumes.

We thus focused primarily on a comparison of the two warning systems, corresponding to the outlier detections conducted under the two separate models. However, as we believe that such systems would in most cases lead to a detailed check of a given match already if only one of the two approaches flags a match, the presented approach of combining the two models is adequate for practitioners. Furthermore, the high false positive rate of the detection based solely on odds is reduced by the combined approach, whereas at the same time the true positive rate remains at about the same level. However, as discussed above, these results are based on the respective optimal cut-off values, whereas the choice of cut-off values in practice might rather depend on other factors, such as if other expert knowledge can be taken into account.

Clearly, the accuracy and hence the usefulness of the model-based outlier detection crucially depends on the predictive performance of the model. Thus, future research could focus on further refining the model formulations developed in this work, e.g. by incorporating additional covariates, or by allowing models to be dynamically updated throughout a season. For the betting volume model, one could also consider a mul-tilevel model and allow effects for covariates such as the weekday to vary across the several betting types. Furthermore, it would also be of interest to develop statistical models for betting taking placeduring games. For such in-game betting, odds strongly depend on the dynamics of the game, e.g. early goals, ball possession or running dis-tance (cf. Schauberger et al., 2018). Dynamic and effectively latent factors such as the momentum of the match could be accounted for using time series regression

mod-56 Integrating multiple data sources in match-fixing warning systems

els such as Markov-switching GAMLSS (cf. Adam et al., 2017). The wide range of potentially relevant game dynamics, together with the natural time series structure of live-betting data, renders outlier detection based on dynamically adjusted odds a challenging task.

5 The hot hand in professional darts

5.1 Introduction

In sports, the concept of the “hot hand” refers to the idea that athletes may enter a state in which they experience exceptional success. For example, in basketball, players are commonly referred to as being “in the zone” or “on fire” when they hit several shots in a row. Although empirical analyses of the hot hand phenomenon tend to focus on sports due to the corresponding data being relatively easily accessible, the notion of the hot hand does in fact apply to much more general settings in which streaks may occur, including human performance in general (Gilden and Wilson, 1995a), artistic, cultural, and scientific careers (Liu et al., 2018), the performance of hedge funds (Edwards and Caglayan, 2001; Hendricks et al., 1993; Jagannathan et al., 2010), enduring rivalries in international relations (Colaresi and Thompson, 2002; Gartzke and Simon, 1999), and even gambling activities, against all odds (Xu and Harvey, 2014). However, when perceiving such dynamics, people tend to over-interpret streaks of success and failure, respectively (Bar-Hillel and Wagenaar, 1991). This phenomenon has been studied intensively by behavioural economists and psychologists (see, e.g., Tversky and Kahneman, 1971, 1974), and is regarded as a cognitive illusion that has been considered as a primary example for how humans form beliefs and expectations (Kahneman, 2011;

Thaler and Sunstein, 2009). Especially in gambling settings it has been demonstrated that people strongly believe in the “streakiness” of their performances, while at the same time also acting according to the gambler’s fallacy, such that after a streak of identical outcomes an increase in betting volume against the streak is observed despite an i.i.d. random process generating the outcome (Croson and Sundali, 2005). Such apparent irrationality underlines the importance of being able to precisely quantify a potential hot hand effect in settings where its existence is highly disputed, e.g. in professional sports. In general, a profound knowledge regarding the existence and magnitude of streakiness in performances can aid general decision-making (Miller and

58 The hot hand in professional darts

Sanjurjo, 2018).

In their seminal paper, Gilovich et al. (1985) analysed basketball free-throw data to find no support for a hot hand, hence coining the notion of the “hot hand fallacy”. The alleged fallacy has been attributed in particular to a potential memory bias, with notable streaks in performances being more memorable than outcomes that are perceived as random, but also to general misconceptions regarding chance, with laypeople expecting randomness to lead to performances that are more balanced in terms of successes and failures than is actually the case. Since the landmark paper by Gilovich et al. (1985), there has been mixed evidence regarding the hot hand in sports, with some papers claiming to have found indications of a hot hand phenomenon and others disputing its existence. Bar-Eli et al.(2006) review the literature on the hot hand in sports, including analyses of data from basketball, baseball, golf, tennis, volleyball and bowling. They summarise 24 studies, from which only 11 studies provide evidence for a hot hand effect. Perhaps due to the availability of increasingly large data sets, most of the more recent studies have found evidence for a hot hand effect (see Green and Zwiebel, 2017;

Miller and Sanjurjo, 2018; Raab et al., 2012; Shea, 2014), whereas only some studies dispute its existence by providing mixed results (see Elmore and Urbaczewski, 2018;

Wetzels et al., 2016).

Two types of approaches have been used to investigate such potential hot hand patterns, namely 1) analyses of the serial correlation of shot outcomes (see, e.g., Dorsey-Palmateer and Smith, 2004; Gilovich et al., 1985; Miller and Sanjurjo, 2014), and 2) such that use a latent variable to describe the form of a player (see, e.g., Albert, 1993; Green and Zwiebel, 2017; Sun, 2004; Wetzels et al., 2016), where the hot hand is understood as serial correlation in shot probabilities. While there is no consensus in the literature regarding the definition of the hot hand, Stone (2012) and Miller and Sanjurjo (2018) show that direct analyses of the correlation in outcomes, as per 1) above, may vastly underestimate the correlation in shot probabilities. For example, a correlation of ρp =0.4 in shot probabilities can co-occur with a very much lower correlation of ρ_r =0.057 in shot realisations (Miller and Sanjurjo, 2014; Stone, 2012). In other words, if a genuine hot hand process is driven by autocorrelation in success probabilities (i.e. in players’ forms) — which may very well be the case — then this can easily go undetected if the focus lies on the (much weaker) serial correlation of outcomes. Stone (2012) and Arkes (2013) thus conclude that it is preferable to

5.1 Introduction 59

analyse correlation in shot probabilities, as per 2) above. Hence, in this chapter, we focus on approach 2), which we believe is also more aligned with the way terminology related to the hot hand concept (e.g. “on fire”, “in the zone”) is commonly applied — as argued by Stone (2012), it seems most natural to measure players’ form by their time-varying success probabilities, rather than (noisy) shot outcomes.

In addition to such conceptual issues regarding the representation of the hot hand in the data-generating process,Miller and Sanjurjo (2018) highlight a subtle selection bias that may sneak into analyses of sequential data, which provides a further challenge to the findings of Gilovich et al. (1985). Aside from mathematical fallacies, which would already seem to explain many failed attempts to prove the existence of the hot hand, we note that many of the existing studies considered data, e.g. from baseball or basketball, which we believe are hardly suitable for analysing streakiness in performances. For example, when analysing hitting streaks of a batter in baseball, other factors such as the performance of the pitcher are also important but hard to account for. The same applies to basketball, as there are also several factors affecting the probability of a player to make a shot, e.g. the position (of a field goal attempt) or the effort of the defence. In particular, an adjustment of the defensive strategy to stronger focus on a player during a hot hand streak can conceal a possible hot hand phenomenon (Bocskocsky et al., 2014).

To overcome these caveats, here we investigate whether there is a hot hand effect in professional darts, a setting with a high level of standardisation of individual throws.

In professional darts, well-trained players repeatedly throw at the dartboard from the exact same position and effectively without any interaction between competitors, mak-ing the course of play highly standardised. In the existmak-ing literature, there are very few contributions that consider darts data, and almost all of these are restricted to labora-tory settings. For example, Van Raalte et al. (1995) analyse the effect of positive and negative self-talk on throwing performances, considering the throwing sequences of 60 individuals, each of length 15. The hot hand effect has previously been investigated using darts data by Gilden and Wilson (1995b); analysing only 24 throwing sequences of eight volunteers, they find no evidence for a hot hand effect. Here we consider a much larger data set, with n=167,492 throws in total, which allows for comprehen-sive inference regarding the existence and the magnitude of the hot hand effect. Using state-space models, we evaluate serial dependence in a latent state process, which can

60 The hot hand in professional darts

be interpreted as a player’s varying form, in line with approaches of type 2) above.

Im Dokument Sports statistics in the data age: betting fraud detection and performance evaluation (Seite 69-75)