• Keine Ergebnisse gefunden

Appendix B – Reliability Tests

N/A
N/A
Protected

Academic year: 2022

Aktie "Appendix B – Reliability Tests"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Appendix B – Reliability Tests

Given the different country-related response rates (see Section 2 in the manuscript), is there a relationship between the number of respondents and the reliability of the results? Following other expert surveys’ analyses – e.g., see Borz and De Miguel (2019) – we have run some correlation analyses. The correlation between the number of respondents and the Standard Deviation (SD) of their answers is equal to 0.18. We have also run a correlation between the number of respondents and the scores they had attributed to each system or to each party in each election, equal to 0.06. Finally, we have also investigated whether there is a correlation between the SD of experts’ answers and the average scores they had given. Indeed, it could be that a lower inter-expert agreement had made the average scores more extreme. In this case, the correlation coefficient is equal to 0.003.

Table B1. Standard Deviation per section

Section SD

Personalisation in General

Elections 1.67

Personalisation in Candidate 1.56

(2)

Selection

Personalisation in Policy-Making 1.56 Personalisation in Party Control 1.52

Let us now devote more attention to PoPES single sections. Table B1 above shows the SD scores for each section. Following Hooghe et al. (2010), we calculated the SD of each question at the lowest possible level (for each question for each party) and averaged the SDs at the section level to understand whether the experts were

‘judging different objects, on different dimensions, at different points in time’ (Steenbergen and Marks 2007, p. 351). Consequently, a smaller SD represents a better result, meaning the experts have interpreted the different dimensions in a more similar manner. Table B1 shows that the maximum SD is equal to 1.67 on a scale from 1-10 (see the question formulations in Section 1 in the article), a satisfactory result.

Following the same reasoning, we verified the scores’ reliability from a country-related viewpoint (Table B2). The SDs do not highlight problematic valuesi, notwithstanding the presence of Greece and, to a lesser extent, Germany, as outliers.

(3)

Table B2. Standard Deviation per country

Country SD

Austria 1.63

Belgium (Flanders) 1.22 Belgium (Wallonia) 1.11

Denmark 1.08

Finland 1.28

France 1.73

Germany 1.93

Greece 2.07

Iceland 1.38

Ireland 1.47

Italy 1.51

Netherlands 1.40

Norway 1.11

Portugal 1.46

Spain 1.44

Sweden 1.23

Switzerland 1.71

United Kingdom 1.81

Then, we have performed a variance component analysis (Steenbergen and Jones 2002, Steenbergen and Marks 2007).

Because comparative expert surveys generate hierarchically organised data, we have evaluated the variance across the level(s) that define the hierarchy, and estimated the effect of the expert and the country on the composition of the variance. Table B3 below reports the variance in the data at three different levels (party, country, and expert).

(4)

Table B3. Variance Component Analysis of the responses (party-

level questions)

Model (Intercept) 6.005(0.320) ***

AIC 39772.131

BIC 39808.072

Log Likelihood -19881.065

N 9781

N Party 113

N Respondent 100

N Country 16

Variance Party (Intercept) 1.551 Variance Country (Intercept) 1.173 Variance Respondent (Intercept) 1.084

Variance Residual 3.179

***p < 0.001, **p < 0.01, *p < 0.05. Note: N Country is equal to 16 and not 18 – see Table 4 – because

there are no available answers for Norway and Sweden concerning the party-level questions.

Table B3 shows that the variance at the expert level is the smallest one, a quite satisfactory result, and the estimate is just above 1, a small value on the scale from 1-10 (see also Steenbergen and Marks, 2007: 352). From these results, we have also calculated an inter-expert correlation index (ibidem) equal to 0.592, showing that there is a generally fair level of agreement among the experts, even if there is not a perfect correspondence among them. Following

(5)

Jayasinghe, Marsh and Bond (2003) and Steenbergen and Marks (2007), we have further computed the reliability of experts’

responses by departing from this inter-expert correlation index. The result of this further reliability test, calculated via the Spearman- Brown formula, is equal to 0.889, meaning that experts’ responses can be considered as reliable.

Finally, we have tested the impact of several factors on the SD of experts’ responses (e.g., see Steenbergen and Marks, 2007). For survey-related factors, we have considered the number of respondents and the fact that different sections are indeed related to different aspects of the personalisation of politics. The second set of factors is that of party-related possible determinants: parties’

ideological family and the proportion of votes and seats held by each party after each general electionii. Finally, we have also considered the number of parties included in the survey for each general election and the time passed (in years) between the year in which a general election took place and 2017 (the year when the survey was delivered to the experts).

This last factor is of paramount importance. Indeed, we are aware that one of the main possible critiques of this study concerns

(6)

the reliability of experts’ answers when more time has passed since a general election. In particular, it is possible that time has a significant effect (either positive or negative) on the SD of experts’

answers, and therefore experts’ evaluation would have been influenced by the passing of time. Table B4 below reports the results of this final reliability test.

Table B4. Random-intercept linear regression on the SD of experts’

responses

Model 1 Model 2 Personalization in Candidate

Selection (reference category) - -

Personalization in Policy-Making -0.003 (0.036) -0.003 (0.036) Personalization in Party Control -0.020 (0.035) -0.021 (0.035)

Number of Experts 0.064

(0.019)***

0.064 (0.019)***

Party Family

Communist/Socialist 0.260(0.158) 0.271 (0.158) Ecologist/Green -0.090 (0.199) -0.073 (0.199) Social Democratic (reference

category) - -

Christian-Democratic -0.053 (0.173) -0.053 (0.174) Liberal -0.046 (0.163) -0.043 (0.163) Conservative -0.126 (0.164) -0.130 (0.165) Right-Wing -0.214 (0.176) -0.202 (0.177) Agrarian 0.191 (0.265) 0.189 (0.266) Party Size (votes) -0.007

(0.002)** -

Party Size (seats) - -0.005 (0.002)*

Number of Parties 0.110

(0.021)***

0.114 (0.021)***

Time 0.002 (0.002) 0.002 (0.002)

(7)

Intercept 0.657

(0.220)*** 0.605 (0.218)***

AIC 3974.427 3977.373

BIC 4068.904 4071.851

Log Likelihood -1970.213 -1971.687

N 1915 1915

N Party 113 113

N Country 16 16

Variance Party (Intercept) 0.229 0. 231 Variance Country (Intercept) 0.044 0.044

Variance Residual 0.391 0.391

*** p< 0.001; ** p< 0.01; * p< 0.05; Standard errors in parentheses.

Note: N Country is equal to 16 and not 18 because there are no available answers for Norway and Sweden concerning the party-

level questions.

A first important result is that, all other things being equal, expert answers’ SD does not depend on the fact they answer different questions on Western European political parties (see the coefficients and significance levels of Personalization in Policy- Making and Personalization in Party Control). This does not mean there are no differences among political parties in these different fields, but simply that the differences in the SD of experts’ answers does not depend on the fact they were answering questions related to different aspects of the personalisation of politics. At this point, a critique might arise: maybe, the fact that we have excluded the Personalization in General Elections has had an impact: we have run a further regression where the Personalization in General Elections

(8)

is includediii and the results show that the SD of experts’ answers does not depend on the presence of different questions in different sections of the PoPES.

Turning to political parties’ factors, belonging to specific party families does not affect the variability in experts’ answers, while slightly less variable answers have been provided for bigger parties, and vice versa, in line with similar expert surveys’ reliability tests (Hooghe et al. 2010).

Passing to the last three variables, the inclusion of a higher number of parties in a country questionnaire does make experts’

answers more fluctuating, as well as the number of experts. Finally, Table 5 shows that asking experts questions about events that happened a certain number of years ago does not affect the variability of their answers. This is a crucial confirmation of the reliability of experts’ answers: while one could have expected experts’ knowledge or memory to be more reliable concerning more recent events, the analysis shows that experts did maintain the same analytical capacity even when asked questions about less recent events. In other words, the SD of experts’ scores does not depend on the fact that they answered questions about more recent or,

(9)

conversely, less recent events. We have also tested whether including the general question of the PoPES changes something concerning the effect of timeiv: also in this case, the passing of a higher (or lower) number of years does not influence expert answers’ SD.

(10)

References

Borz, G. and De Miguel, C., 2019. Organizational and Ideological Strategies for Nationalization : Evidence from European Parties. British Journal of Political Science, 49 (4), 1499-1526.

Döring, H. and Manow, P., 2018. Parliaments and governments database (ParlGov): Information on parties, elections and cabinets in modern democracies. Stable Version.

Hooghe, L., Bakker, R., Brigevich, A., De Vries, C., Edwards, E., Marks, G., Rovny, J., Steenbergen, M., and Vachudova, M., 2010. Reliability and validity of the 2002 and 2006 Chapel Hill expert surveys on party positioning. European Journal of Political Research, 49 (5), 687–703.

Jayasinghe, U.W., Marsh, H.W., and Bond, N., 2003. A multilevel cross-classified modelling approach to peer review of grant proposals: the effects of assessor and researcher attributes on assessor ratings. Journal of the Royal Statistical Society. Series A: Statistics in Society, 166 (3), 279–300.

(11)

Marks, G., Hooghe, L., Steenbergen, M.R., and Bakker, R., 2007.

Crossvalidating data on party positioning on European integration. Electoral Studies, 26 (1), 23–38.

Nohlen, D. and Stöver, P., eds., 2010. Elections in Europe - A Data Handbook. Baden-Baden: Nomos.

Steenbergen, M.R. and Jones, B.S., 2002. Modeling Multilevel Data Structures. American Journal of Political Science, 46 (1), 218–

237.

Steenbergen, M.R. and Marks, G., 2007. Evaluating expert judgments. European Journal of Political Research, 46 (3), 347–

366.

(12)

i We have not discussed Table B2' SD scores for Belgium (Wallonia), Norway, and Sweden: for these three cases, only a handful of questions have been answered by more than one expert.

Therefore, even if these cases show fairly satisfactory results, we must remember that such results go hand in hand with an extremely small number of cases.

ii See also Marks et al. (2007). Data for parties’ ideological families come from ParlGov (Döring and Manow 2018), while for parties’ electoral results until 2009, they come from Nohlen and Stover (2010), and for parties’ results from 2010 until 2016, from each country’s electoral authority. Finally, given the known problems in Nohlen and Stover’s data concerning Italy, electoral data for this country come from the website of the Italian Ministry of the Interior.

iii More specifically, a random-intercept linear regression having, as independent variables, the PoPES sections, time, and the number of experts.

iv See endnote iii.

Referenzen

ÄHNLICHE DOKUMENTE

Ondensetron plasma concentrations were 278 (57), 234 (55) and 243 (58) ng ml 1 at the sweating, vasoconstriction and shivering thresholds, respectively; these corresponded to 50 mg

Friedman's test for repeated measures analy- sis of variance on ranks and the Student- Newman-Keuls test for multiple comparison were used to test for differences in the reactions

The high cost of reproduction in female common noctules is reflected in differences between reproductive and non-reproductive individuals in their behavior and habitat selection

1 (set to be 1 in the graphic) between hypothesized and matched values is expected. If in tinnitus subjects the upper frequency falls into a dead region with severely damaged inner

Depending on the cellular context and receptor species (apparent affinity of human EPO for huEPOR was about three times as high as that for rodent EPOR), EPO bound at 10 to 200

combination of zeros and ones are relevant and which are not. However, this is impossible in mathematical or formal con- cepts, as the theory language does not have the power of

computing word and feature similarities (Section 3.2); word sense induction (Section 3.3); labeling of clusters with hypernyms and images (Section 3.4), disambiguation of words

We give an example of a pure group that does not have the independence property, whose Fitting subgroup is neither nilpotent nor definable and whose soluble radical is neither