• Keine Ergebnisse gefunden

Exploiting individual-level polling data

4.5 Application

4.5.2 Exploiting individual-level polling data

The main limitation of uniform-swing based forecasts is that they fail to incorporate campaign-specific, constituency-level information. The uniform-swing model is ignorant towards local campaign dynamics as long as they are not explicitly incorporated in the correction procedure.

Polls on voting intention can serve as a remedy to this problem. District-level polls are rarely available, and the 2013 German election campaign was no exception. However, I can capital-ize on an exceptionally rich polling database, provided by the Germanforsainstitute.13 forsa surveys 500 respondents every working day and asks them, among other things, about their vote intention for the next general election. This allows me to construct a forecasting model which, in contrast to the uniform swing model, incorporates local information.14

In order to stabilize the constituency-level forecasts, I employ a Bayesian modeling strategy which has been suggested by Selb and Munzert (2011) to estimate constituency preferences using survey data and geographic information. The method is presented in-depth by Selb and Munzert (2011), so I limit the description to the gist of the matter. The vote intention of a respondent in a constituency is modeled as a function of a global mean, the individual voting behavior at the last election, a constituency-level covariate (log inverse district size)

13Data for previous years are available athttp://www.gesis.org/en/elections-home/other-surveys/forsa-bus/.

14Note that alternative data sources like local betting markets or vote expectation surveys which target the local level were not available.

4.5. Application

and two constituency-level random effects, one of which is assumed to vary independent and identically across districts following a normal distribution, and another which is imposed to follow an intrinsic conditional auto-regressive (CAR) distribution (see Besag, York and Molli´e, 1991). As true party vote shares for past elections are known at the district level, the predicted probabilities of voting for a party are then weighted according to the recalled voting coefficient (see Park, Gelman and Bafumi, 2004). For the estimation procedure, I pool survey data in the period of five months to one month before the election date to be able to draw a reasonable number of respondents per district for the estimation procedure.15Table C.4.1 in the Appendix provides summary statistics for the utilized polling data.16

The prediction results at the previous three elections are reported in Table 4.3. Using the un-corrected estimates, no more than 85% of the district outcomes are forecast correctly.17 There-fore, I again apply the correction strategy and regress actual first vote shares in every district in the three elections on the poll model predictions and party-constituency random effects (see again Equation 4.2). The results are presented in Table 4.4. While the relationship be-tween actual first vote shares and the poll model forecast is, on average, nearly one-to-one, the party-specific slopes and intercepts reveal substantive bias in the original model. Specifically, SPD and Die Linke vote shares are underestimated in constituencies where the parties’ can-didates performed well, whereas the opposite is true for cancan-didates the other parties, where the model corrects the original forecasts towards the mean (slopes<1). Further, the estimated variance of the party-district-specific errors is substantive, indicating that there are other

un-15If the chosen time window is too narrow, the constituency-level estimates would tend to rely more on the grand mean instead of the local (or neighboring) information, which would curtail the model of its desired feature to capture actual local preferences.

16As has been described above, it is challenging to assign electoral districts to respondents. While most elec-tion studies provide such identificaelec-tion variables, polling data that are not primarily used for scientific purposes sometimes come with no geographical identifier at all or locate respondents in other than electoral units. The forsadata come with identifier variables of German administrative units. There are several ways to attach district identifiers to respondents which I discuss in more detail in Appendix C.2. In short, I identify all possible dis-tricts for each respondent and randomly assign the respondent to one of them. With regards to the forecasting method that uses respondents from neighboring districts to estimate vote intentions, an exact match should not substantively improve forecasting performance compared to this simplifying approach.

17See also Figure C.4.3 in the Appendix, which visualizes the relationship between the poll-based estimation results and the actual election results. Although the fit seems to be rather good, the polls are significantly biased.

For instance, SPD vote shares tend to be under-estimated.

4.5. Application

Table 4.3: Predictive performance of the polling model, uncorrected and corrected forecasts.

The first five columns report party specific mean absolute errors over all 299 districts in each election. The last column reports the percentage of correctly forecast districts (predicted win-ner equals actual winwin-ner). Cells where the corrected forecast outperforms the uncorrected forecast are highlighted in grey.

CDU/CSU SPD FDP B’90/Die

Gr¨unen

Die Linke % Overall correct

Uncorrected

2002 0.034 0.07 0.043 0.014 0.025 85.0

2005 0.033 0.105 0.026 0.016 0.024 62.5

2009 0.051 0.056 0.057 0.019 0.034 83.3

Corrected

2002 0.021 0.024 0.008 0.007 0.015 96.0

2005 0.031 0.035 0.005 0.008 0.015 80.9

2009 0.023 0.024 0.008 0.011 0.015 92.0

Table 4.4: Bayesian estimates of the model of party first vote shares, based on polls model

Predictor 95% CI

Interceptα

CDU/CSU 0.126 [0.115;0.139]

SPD 0.038 [0.030;0.046]

FDP 0.004 [-0.003;0.010]

B’90/Die Gr¨unen 0.011 [0.005;0.017]

Die Linke 0.003 [-0.002;0.007]

Polls estimateβpolls

CDU/CSU 0.716 [0.685;0.744]

SPD 1.127 [1.099;1.153]

FDP 0.508 [0.532;0.634]

B’90/Die Gr¨unen 0.894 [0.813;0.968]

Die Linke 1.291 [1.237;1.345]

Party-constituency-level varianceσξ2 0.024 [0.023;0.026]

Residual varianceση2 0.028 [0.027;0.028]

N 5.980

accounted factors at the district level which play a role in second vote polling to first vote share transformation.

The gain of the correction procedure for the polling model is impressive (see Table 4.3 and Figure C.4.3 in the Appendix). The fit of the forecasts improves considerably over all parties and years, both in terms of mean absolute errors and correctly predicted district winners.

4.5. Application

Next, the polling forecasts for the 2013 election are generated. According to the uncorrected forecast, 290 districts are attributed to the CDU/CSU and 9 to the SPD, mirroring the great advantage of the CDU/CSU in the raw polls. The corrected forecasts iron out this bias to a certain extent. Still, this forecast is significantly more favorable for the conservative parties than the corrected uniform swing model, with 261 vs. 224 seats for CDU/CSU and 34 vs. 70 seats for the SPD, respectively.