Data analyses - It matters how you ask : Emotional ratings of help-related picture content

Pen-and-paper ratings for each participant were entered into an Excel table manually according to the scoring scheme illustrated in Figure 2. All data structuring and analyses were conducted using the GNU software R (version 3.0.2). Arousal, pleasantness and unpleasantness ratings were coded from 0 (calm, no (un)pleasant feelings) to 8 (excited, strong (un)pleasant feelings), bipolar valence ratings from -4 (sad) to 4 (happy). The intensity of mixed feelings for each picture was scored as the minimum rating given on pleasantness and unpleasantness scales for each picture as suggested by Schimmack (2001). All pictures received a minimum of 106 valid ratings on each dimension (see Appendix A for exact ns). Mean ratings and scores of mixed feelings were calculated across participants.

All analyses were based on effect sizes and estimation of confidence intervals. As the use of null-hypotheses testing has been shown to yield arbitrary and potentially misleading results (see e.g. Cumming, 2014; Kline, 2004), traditional p-values were not calculated. Interpreta-tion of CIs can be related to classical p-values in the following way: non-overlapping CIs of means indicate a “significant” difference, and CIs of effect sizes not overlapping 0 indicate a

“significant” effect.

The relation between average ratings on the different scales and dimensions was assessed using generalized linear models. Additionally, Pearson correlations were calculated for linear relations. To assess the effects of picture content, mean ratings and surrounding 95% CIs were calculated. Cohen’s d was used as an estimate of effect size. The variance of d was computed using the conversion formula: ((n1 +n2)/(n1∗n2) + 0.5∗d²/df)∗((n1 +n2)/df) (Cooper et al., 2009). In order to correct for biased population estimation of Cohen’sd, Hedges correction was used (Hedges and Olkin, 1985). Magnitude of Cohen’s d was judged based on suggestions by Cohen (1969).

Participants’ gender was considered as a factor in the main analyses as men and women have been shown to differ regarding various aspects of emotional experience (Bradley et al., 2001; Brody and Hall, 2008; Fischer, 2000). The effects of language on mean ratings were also assessed for men and women separately, as gender distribution was different for samples receiv-ing German and English instructions. Influences of scale order were only considered regardreceiv-ing the relation between mean ratings within bipolar and unipolar scale types, as participants rated only half of the pictures on either scale type. Further influences of dimension order on relation between mean ratings per picture and content effects were assessed across gender, language and scale order, as distribution of these variables was equal across dimension orders (see Table 1).

strong

Unipolar pleasantness & unpleasantness scales

Aggregated unipolar

Figure 2. Example rating scales and scoring procedure for all rating dimensions. Light blue lines refer to ratings on bipolar valence and arousal scales, dark blue ones to pleasantness and unpleasantness scales. Gray lines refer to scores obtained by combining pleasantness and unpleasantness ratings. Raw rating scores are framed with rectangles, scores derived from pleasantness and unpleasantness ratings with ellipses. Solid lines indicate direct use of raw scores, dashed ones inferences. Note that participants only used either bipolar valence and arousal scales or pleasantness and unpleasantness scales for each picture at a time.

Pre-analyses showed that the pictures’ sequence position did not influence ratings on any of the four rating dimensions, all |r|(82)≤0.19. Also, the gender of the depicted child did not affect ratings on either dimension, as even the largest difference’s CI overlapped zero,d= 0.50, 95% CI [−0.11, 1.11]. No interaction of the depicted child’s gender and participants’ gender (own-gender effect) emerged; the largest difference between boy and girl depictions was present in male participants, but this difference between ratings for boy and girl pictures was still likely

2 METHODS

to point into either direction, d = 0.53, [−0.37, 1.43]. Thus, further analyses were conducted without consideration of presentation sequence or gender of the depicted child.

3 Results

3.1 Relation between ratings on different scales and dimensions

3.1.1 Strong relations between ratings within and across scale types

The relation between mean arousal and valence ratings per picture across the stimulus set is illustrated in Figure 3. A quadratic relation between bipolar valence and arousal ratings, as often reported in the classic literature, was evident, r(82) =.60 , 95% CI [.44, .72], R²_adjusted= 0.35. However, the linear relation between those scales was just as strong, r(82) = −.66, [−.77, −.52], and accounted for 43% of the variance.

−4 −3 −2 −1 0 1 2 3 4

Figure 3. Relation between bipolar valence and arousal ratings. Each data point corresponds to mean bipolar valence and arousal ratings for one picture. The regression line was created using MatLab LSD.

Mean ratings of pleasantness and unpleasantness per picture showed a strong negative linear correlation (see Figure 4), r(82) = −.87, 95% CI [−.92, .81], R²_adjusted = 0.76, illustrating that participants understood the antagonistic relation between both rating dimensions.

All participants used bipolar valence and arousal scales for one half of the stimuli, pleas-antness and unpleaspleas-antness scales for the other half. Thus, means on both scale types can be directly related to another, as they stem from the same population.

First, the extent to which variance in bipolar valence ratings can be inferred from the difference between pleasantness and unpleasantness ratings was investigated. A nearly perfect

3 RESULTS

Figure 4. Relation between pleasantness and unpleasantness ratings. Each data point corre-sponds to mean bipolar valence and arousal ratings for one picture. The regression line was created using MatLab LSD.

linear correlation (see Figure 5 B), r(82) = .96, 95% CI [.94, .98], accounted for 92 % of variance in bipolar valence ratings. It can therefore be assumed that participants employed the two types of scales similarly and that the information inherent in bipolar valence ratings can be inferred almost completely by calculating the difference between unipolar pleasantness and unpleasantness ratings.

Second, the amount to which the overall intensity of pleasant and unpleasant feelings could account for arousal ratings was assessed. A high proportion of variance in arousal ratings, i.e. 56%, could be explained by this aggregation of pleasantness and unpleasantness ratings by means of a strong linear correlation (see Figure 5 A), r(82) =.75, 95% CI [.64, .83]. In sum, these findings suggest an overall strong association between participants’ rating behavior on bipolar and unipolar scales.

3.1.2 Modulation of rating dimensions’ relations by language and participants’

gender

As gender distribution differed between samples receiving booklets in English and German, the potential impact of language and gender on the relations between mean ratings on different scale dimensions was assessed jointly. The detailed results are displayed in Table 3. The pattern of differences in scale relations according to language was different for men and women. For

−6 −4 −2 0 2 4 6

Figure 5. Relation between aggregated unipolar ratings and arousal (A) as well as bipolar valence(B). Each data point corresponds to mean ratings for one picture. Regression lines were created using MatLab LSD.

women, the negative linear relation between arousal and bipolar valence ratings was stronger if German rather than English language was used throughout the study. For men the strength of the antagonistic relation between pleasantness and unpleasantness ratings was higher if the study was conducted in German rather than English.

Furthermore, differences between men and women in ratings’ relations were evident inde-pendent of study language. Pleasantness and unpleasantness ratings were related to each other more strongly for women than for men. This was true for the direct correlation between pleas-antness and unpleaspleas-antness as well as the extent to which bipolar valence ratings could be inferred from the difference between unpleasantness and pleasantness ratings (see bold values in Table 3). For those participants receiving English instructions and material, inference of arousal ratings from the sum of pleasantness and unpleasantness ratings was also considerably diminished for men, i.e. nearly absent, compared to women. A similar trend was also observed in the sample receiving German instructions and materials, but possibly due to the small num-ber of men in this sample (N = 15), correlations’ CIs still overlapped, albeit to a small amount only. In sum, gender differences point towards women reporting pleasantness and unpleasant-ness as being more antagonistic and showing stronger equivalence between ratings on unipolar and bipolar rating scales.

3 RESULTS

Table 3. Strength of relation between valence, arousal, pleasantness (P) and unpleasantness (U) for women (top) and men (bottom) participating in the English (left) and German (right) study version.

English German

Gender Relation r 95% CI R²_adjusted r 95% CI R²_adjusted

Women Valence ∼arousal −.40 [−.57 / −.21] .15 −.79 [−.86 / −.69] .62

Note. Effect sizes whose CIs indicate a difference in relation strength between languages are highlighted with gray shading. Effect sizes whose CIs indicate a difference in relation strength between men and women are highlighted in bold.

3.1.3 Independence of rating dimensions’ relations from scale and dimension order

The relation between mean ratings of dimensions of one scale type, i.e. between arousal and bipolar valence as well as between pleasantness and unpleasantness ratings, could be assessed independently for each out of the four different booklet versions (see Table 1). Correlations between mean arousal and valence ratings, between arousal and square valence ratings, as well as between mean pleasantness and unpleasantness ratings for either booklet version did neither differ from each other nor from the overall results described above. The largest tendency away from the relations reported across booklet versions was found for the quadratic relation between valence and arousal ratings, with r=.41, 95% CI [.13, .64], and hence still largely overlapping CIs for both results.

All participants rated stimuli in the same pre-randomized order and the use of bipolar and unipolar rating scale types was changed after half of the experiment. Accordingly, relations between mean ratings on aggregated unipolar and bipolar rating scales may only be assessed by combining data of participants using first bipolar and such using first unipolar rating scales. As for the relations within one scale type, relations between scale types were not different according

to which rating dimension was shown on top of each page. The most dissimilar correlation to the overall results was the relation between the sum of pleasantness and unpleasantness ratings and arousal regarding data of participants who first rated valence or unpleasantness on top of each page,r =.69, 95% CI [.56, .79].

Im Dokument It matters how you ask : Emotional ratings of help-related picture content (Seite 23-30)