It matters how you ask : Emotional ratings of help-related picture content

(1)

Abschlussarbeit zur Erlangung des akademischen Grades eines Master der Naturwissenschaften (MSc.)

It matters how you ask:

Emotional ratings of help-related picture content

Eine Frage der Fragestllung: Emotionale Bewertungen hilferelevanter Bildinhalte

vorgelegt von:

Aenne A. Brielmann

Matrikelnummer: 01/755188

an der

Fachbereich Psychologie am 22. August 2014

Erstgutachter: Prof. Dr. Thomas G¨otz Zweitgutachter: Dr. Tobias Flaisch

Konstanzer Online-Publikations-System (KOPS)

(2)

I would like to express my gratitude to Prof. Dr. Thomas G¨otz and Dr. Tobias Flaisch who were evaluating this thesis. My special thank goes to Dr. Margarita Stolarova who enabled the realization of the study presented and greatly contributed to developing the extension of the stimulus set. I would also like to thank the students who participated in this study and extend my gratitude to the students who assisted with data collection: Marcus Kicia, Joana Chomakova, and Jennifer M¨uller.

My personal thank goes to my family for always supporting me and encouraging my dedica- tion to scientific research.

Parts of this thesis will be published as a journal article.

(5)

ABSTRACT

Abstract

Emotions play a vital role in human social behavior. One important aspect of emotional experiences in social contexts is that people are able to experience and report similar emotions as a person they observe. This capability is thought to be one major motivation for prosocial behavior. In this thesis I explore the influence of help-related picture content on self-reported emotions. Within this framework the impact of different types of rating scales and of rating material’s characteristics on self-reported emotional experiences is assessed.

Participants (N = 278) were shown two different subsets of black-and-white drawings varying systematically regarding help-related content. In the first subset, half of the drawings depicted a child or a bird needing help to reach a trivial goal, the other half showed the agent reaching the goal. The second subset showed adults either actively helping a child or passively present next to it. Control pictures of the adult and the child alone were also included. Par- ticipants reported their subjective emotional experiences while viewing the stimuli using two types of 9-point scales. For one half of the pictures scales of arousal (calm to excited) and of bipolar valence (sad to happy) were employed, for the other half scales of unipolar pleasantness and unpleasantness (strong to absent) were used. This design allowed comparison of emotional ratings on two types of scales within one population. The order of rating scale types as well as of each scale types’ rating dimensions was counter-balanced across participants. Moreover, the study was conducted in English (N = 125) and German (N = 117).

Variance in mean bipolar valence ratings could be inferred from the difference between pleasantness and unpleasantness ratings. The overall intensity of reported pleasant and unpleasant feelings accounted well for variance in arousal ratings. Participants reported to have pleasant and unpleasant feelings at the same time. Pictures showing everyday need of help situations were rated lower in valence, higher in arousal, less pleasant and more unpleasant than corresponding pictures in which the agent did not need help. Pictures of adults helping a child were rated similar to control pictures showing a passive adult on all dimensions. Consideration of one scale order or language, however, would not have changed the principal interpretation of help-relevant picture content’s effects.

In sum, results suggest that arousal and bipolar valence ratings are closely related to unipolar pleasantness and unpleasantness ratings. Moreover, valence should not solely be regarded as a bipolar construct, as participants reported to have pleasant and unpleasant feelings at the same time. Hence, these so called mixed feelings should be assessed, too. The large effects of need of help content in everyday situations on self-reported emotional experiences highlight the

(6)

strength and reflexiveness with which emotions congruent to observed others are experienced.

Rating material’s language and order of scales may have an impact on participants rating behavior but is unlikely to distort major effects of manipulated content.

(7)

ABSTRACT

Zusammenfassung

Emotionen nehmen eine Schlüßelrolle im menschlichen Sozialverhalten ein. Ein wichtiger Aspekt hierbei ist, dass Menschen dazu in der Lage sind, die Gefühle eines anderen so zu erleben und zu berichten, wie es der Beobachtete tut. Es wird angenommen, dass diese Fähigkeit eine wichtige Motivation für prosoziales Verhalten darstellt. Diese Masterarbeit untersucht, wie hilferelevante Bildinhalte selbstberichtete Emotion beeinflussen. In diesem Rahmen wird der Einfluss der Nutzung verschiedener Arten von Bewertungsskalen auf selbstberichtetes emotionales Erleben untersucht. Zudem werden die Einflüsse der formalen Charakteristika des Bewertungsmaterials berücksichtigt.

Den Teilnehmer*innen dieser Studie (N = 278) wurden zwei Sets schwarz-weißer Zeichnun- gen, die systematisch in ihrem hilferelevanten Inhalt variierten, gezeigt. Im ersten Set zeigte die Hälfte der Bilder ein Kind oder einen Vogel, der Hilfe braucht, während die zweite Hälfte der Bilder dieselben Protagonisten dabei zeigte, wie sie ihr Ziel erreichen. Das zweite Set von Zeichnungen zeigte Erwachsene, die einem der Kinder aus dem ersten Set von Bildern helfen, oder passiv neben diesem abgebildet sind. Kontrollbilder zeigten zum einen den Erwachsenen, zum anderen das Kind alleine. Die Teilnehmer*innen bewerteten ihr emotionales Empfinden während sie diese Bilder betrachteten auf zwei unterschiedlichen 9-Punkt Likert-Skalen. Für je eine Hälfte der Bilder wurden Skalen bipolarer Valenz (glücklich bis traurig) und Erre- gung (ruhig bis aufgeregt) genutzt. Die jeweils andere Hälfte der Bilder bewerteten die Teil- nehmer*innen auf unipolaren Skalen für die Intensität ihrer angenehmen und unangenehmen Gefühle (keine bis starke). Dieses Studiendesign erlaubt es, emotionale Bewertungen auf beiden Arten von Skalen innerhalb einer Population zu vergleichen. Die Reihenfolge der Skalentypen sowie der Bewertungsdimensionen wurde über die Teilnehmer*innen hinweg gegen-balanciert.

Außerdem wurde die Studie auf Deutsch (N = 117) und Englisch (N = 125) durchgef¨uhrt.

Im Hinblick auf das gesamte Bildmaterial konnte von der Differenz zwischen der Intensität angenehmer und unangenehmer Gefühle auf die Varianz in Valenzbewertungen geschlossen werden. Zudem klärte die Gesamtintensität der berichteten angenehmen und unangenehmen Gefühle einen Großteil der Varianz in den Erregungs-Bewertungen auf. Dabei berichteten die Teilnehmer*innen, beim betrachten desselben Bildes sowohl angenehme als auch unangenehme Gefühle zu empfinden. Bilder, die jemanden zeigten, der Hilfe braucht, wurden im Vergleich mit Bildern, die keine Hilfebedürftigkeit zeigten, als trauriger, erregender, weniger angenehm und unangenehmer bewertet. Abbildungen von Erwachsenen, die einem Kind helfen, wurden auf allen Dimensionen ähnlich wie Kontrollbilder, die einen passiven Erwachsenen zeigen, bewertet.

(8)

Betrachtung einer einzelnen Sprach- oder Reihenfolgen-Variante hatte keinen Einfluss auf die allgemeine Interpretation der Effekte hilferelevanter Bildinhalte.

Diese Ergebnisse legen nahe, dass Bewertungen bipolarer Valenz und Erregung eng mit der Intensität angenehmer und unangenehmer Gefühle zusammenhängen. Darüberhinaus kann Valenz nicht ausschließlich als bipolares Konstrukt aufgefasst werden, da die Teilnehmer*innen berichteten, angenehme und unangenehme Gefühle gleichzeitig zu empfinden. Daher sollten auch solche gemischten Gefühle erfasst werden. Der starke Einfluss der Abbildung von Hil- febedürftigkeit in Alltagssituationen auf das berichtete emotionale Erleben unterstreicht wie stark und automatisch Emotionen, die denen eines beobachteten Anderen ähneln, ausgelöst werden. Sprache des und Reihenfolgen im Bewertungsmaterial können mittlere Bewertungen beeinflussen, nicht aber die hauptsächlichen Effekte der manipulierten Variablen verzerren.

(9)

1 INTRODUCTION

1 Introduction

1.1 The importance of emotional experiences in social cognition

What do you feel if you see a child cry? The answers to this question given by diverse people will be astonishingly fast and similar: sadness, distress, the urge to help, and so on. The reason for this spontaneous and homogeneous response pattern is the fascinating human tendency to adopt other person’s emotional states. In this way, emotions play a key role at multiple levels of social cognition and behavior (Keltner and Haidt, 1999). As noted e.g. by Gallese and colleagues (2004) “we do not just ‘see’ or ‘hear’ an action or an emotion. Side by side with the sensory description of the observed social stimuli, internal representations of the state associated with these actions or emotions are evoked in the observer, ‘as if’ they were performing a similar action or experiencing a similar emotion.”. Humans’ ability to feel what (they believe that) others feel has been given many names; the most comprehensive and famous one is “empathy”.

One broader definition of empathy is “an affective response that stems from the apprehension or comprehension of another’s emotional state or condition and is similar to what the other person is feeling [. . . ]” (Eisenberg et al., 2006). However, there is a wealth of different definitions of empathy. Some of these define empathy in a more narrow sense and incorporate the premise that the observer knows about the external source of his or her affective state (De Vignemont and Singer, 2006). Others distinguish between cognitive and affective empathy (Cox et al., 2012; Hooker et al., 2010). Taking this distinction into account, what has been described so far corresponds best to affective empathy that is sometimes – but not always – distinguished from emotional contagion. According to all definitions, however, the observer’s affective state changes according to what another person is doing or experiencing.

Despite considerable confusion regarding the terminology of “empathy”, the core idea of emotions’ pivotal role in social cognition cannot be denied (Gallese et al., 2004; Lemerise and Arsenio, 2000). It seems plausible and well established that affective responses are elicited in the observer and that these responses are congruent to the ones (believed to be) experienced by the observed person. It is beyond the scope of the current thesis to discuss the terminology of empathy any further (but see Preston and Hofelich, 2012, for a recent and careful dissemination of this term). Instead of being lead astray by trying to (re-)define the concept of “empathy”, this thesis will focus on examining the process of experiencing the emotions of others. This more precise construct may be regarded as one key component of most empathy definitions.

The assumption that others’ distress will elicit distinct emotional responses in the observer

(10)

has been present in social psychological theories for decades (Batson and Shaw, 1991; Cialdini et al., 1973; Piliavin et al., 1982). These established theories assume that seeing another person in distress will evoke a similar emotional reaction as experienced by the person in need. The arousal-cost-reward model e.g. assumes that a kind of “arousal” is elicited when seeing someone in need of help (Dovidio, 1991; Fischer et al., 2006; Piliavin et al., 1982; Schroeder et al., 1995).

In the framework of the arousal-cost-reward model the notion of heightened arousal is often complemented with a negative, unpleasant valence in this emotional reaction. Thus, “arousal”

in this context does not refer to the same concept as used in theories of core affect (e.g.

Bradley and Lang, 1994; Russell, 1979). Even though the theory is labeled with “arousal” only, it rather refers to an “empathic arousal” (Hoffman, 1981) that is defined just as vague regarding components of affected emotions as general definitions of empathy. Other theories explicitly state that the valence of emotions will be affected by seeing someone in need. For instance the empathy-altruism hypothesis (Batson and Shaw, 1991) assumes that “[e]mpathy felt for someone who is suffering will likely be an unpleasant, aversive emotion”. The negative state relief theory states that seeing others’ need of help elicits unpleasant feelings (initially proposed by Cialdini et al., 1973, but see Batson et al., 1989, for an in-depth comparison of negative- state relief and empathy-altruism hypotheses). It is also noteworthy that Batson and colleagues (1989) employ the term “empathy” much the same as Eisenberg and colleagues “sympathy”

(e.g. Eisenberg et al., 1989; Fabes et al., 1993). In sum, there is vague consensus of these theories regarding the assumption that others’ need will elicit some kind of unpleasant feeling.

Nonetheless, there is again considerable confusion regarding terminology and little precision about the exact components of emotions affected.

1.2 Emotional responses to others’ need foster prosocial behavior

Despite their vagueness and inconsistency regarding the exact components of emotion affected, all of these theories assume that emotional responses to others’ need will likely lead to prosocial behavior. Evidence for the relation between prosocial acting and emotional responses has been achieved measuring the ability to respond emotionally to others’ need as a trait (e.g. Carlo et al., 1999; Prot et al., 2014; Wilhelm and Bekkers, 2010) and as an experience in a given situation (e.g. Cao, 2010; Stocks et al., 2009; Sze et al., 2012, see also Eisenberg, 2000 for a review on both aspects). Moreover, recent neuroimaging studies have established a link between neuronal activity in brain regions related to empathic responses and real-life prosocial behavior (e.g. Masten et al., 2011; Rameson et al., 2012, see also Chakroff and Young, 2014, for a review).

(11)

1 INTRODUCTION

Notably, Rameson and colleagues found empathy-related brain responses in regions previously associated with the experience of sadness (Rameson et al., 2012).

It seems that the imprecision with which emotional reactions on others’ need is described is at least partially due to the diverse operationalizations of others’ need or distress. Claims about the relation between prosocial behavior and empathy often assess dispositional (or trait-) empathy (Carlo et al., 1999; Fabes et al., 1993; Prot et al., 2014; Sze et al., 2012), using standardized questionnaires such as the interpersonal reactivity index (Davis, 1983). If a concrete emotional reaction to a person in need is assessed, participants are usually given (visual-)auditory or written information about a scenario in which a person needs help (Cao, 2010; Eisenberg et al., 1989; Fabes et al., 1993; Rameson et al., 2012; Stocks et al., 2009; Sze et al., 2012). However, these studies tell us relatively little about what people are actually feeling when they witness others’ need, because they aim either to assess a general disposition for being empathic (e.g.

(Carlo et al., 1999; Fabes et al., 1993; Prot et al., 2014)) or may only infer further emotional involvement from brain activity (Masten et al., 2011; Rameson et al., 2012) or measure current emotional responses on scales encompassing a variety of adjectives or statements (e.g. Fabes et al., 1993; Fischer et al., 2006; Cao, 2010; Sze et al., 2012). The problem with the latter approach is that it makes arbitrary assumptions about which emotions can be considered distinct and therefore cannot make parsimonious claims about the basic affective processes underlying need of help perception (see Barrett, 2006, for a discussion about the building blocks of emotion).

Thus and even though there is evidence that some emotional activation motivates helping (e.g. pointed out by Schroeder et al., 1995, in chapter 3), previous research has so far been vague about which aspect(s) of emotional experience(s) change(s) when seeing someone in need.

There is ample evidence that feeling empathic towards others motivates prosocial behavior (for a review see Eisenberg, 2000). The exact mechanisms behind this link, however, remain vague.

First, it may be that more empathic persons are truly more motivated to decrease others’

distress and thus their empathically experienced own distress. Second, prosocial behavior per se can be rewarding (Lee, 2008; Schroeder et al., 1995), therewith cheering the helper’s previously dampened emotional state (Cialdini et al., 1973). Third, the act of helping could be generally valued positively over and above the revelation of need or distress, as e.g. suggested by studies showing that infants prefer prosocially acting agents to others (e.g. Hamlin and Wynn, 2011).

These hypothetical assumptions make different predictions about how people should react upon seeing the act of helping performed by others: According to the first two possibilities, seeing someone being helped should not affect observer’s emotional responses compared to seeing a

(12)

non-needy person. The third possibility would predict that observers experience revelation of need more positive if it is achieved by an act of helping. By now, none of these hypotheses can be considered more plausible, as it has not been investigated yet whether an act of helping is already associated with more positive feelings if it is merely observed.

1.3 Operationalization of help-related content

This thesis aims to describe the basic components of subjective emotional experiences that are elicited by seeing others’ need and by seeing the act of helping. This basic-level specification of changes in subjective affect may help to understand how emotional responses foster prosocial behavior. Stimuli showed situations that most commonly lead to helping behavior in real life, i.e. harmless everyday situations (see Figure 1 A for an example and Appendix A for a complete list). Participants’ spontaneous emotional responses to comic pictures of children in need of help were assessed compared to a neutral baseline, i.e. seeing the same situation without need of help being involved. Pictures differed only regarding their help-related content such that lower-level visual characteristics as well as emotional expressions stay constant across conditions. The comic format not only allows to maintain high perceptual similarity between stimuli, it also enables the inclusion of non-humans (i.e. birds) in the same situations as children. These control pictures served to prevent participants from suspecting that they are expected to respond to help-related content, because it is the only variation in picture content.

Moreover, analogue bird pictures allow some extent of generalization regarding claims about need of help content beyond the scope of human con-specifics. So far, this field of research has obtained little attention, but some findings suggest that emotional responding on others’ need increases as a function of phylogenetic proximity (Westbury and Neumann, 2008). Besides need of help and no need of help pictures of a single agent, the stimuli also included pictures designed to assess how seeing somebody helping is perceived. This subset shows adult figures helping the child to reach his or her goal and control pictures, showing the same adult passively besides the child, as well as pictures of the adult and the child alone. The latter control pictures serve to explore how the mere existence of a social context influences observers’ ratings.

1.4 Measurement of subjective emotional experiences

In order to resolve the vagueness with which emotional responses on help-related content have been assessed so far, it is not enough to provide appropriate stimulus material. Indeed, the lack of comparability between studies on emotional responses to others’ need results mainly

(13)

1 INTRODUCTION

from differences in how emotional experience was measured rather than from with regard to what. Therefore it was one major aim of this thesis to explore how the rating material used to measure self-reported emotion can alter participants overall rating behavior as well as picture content’s effects on these ratings. To achieve this goal not only main effects of the rating materials’ characteristics were investigated, but also how the results of the very same analyses differ for two or more different types of rating material. The main emphasis of these method- ological considerations will lie on variations of scale types, as their choice can heavily influence conceptualization of the structure underlying self-reported emotions.

1.4.1 Scale types

The probably most popular tool for assessing the core aspects of emotion in self-reports are the Self-Assessment Manikins (SAMs, Lang et al., 1997). It can be considered useful to employ these widely known and used rating scales as they allow comparison to a wide range of previous studies and interpretation of results is familiar to colleagues within the field. The conventional approach of using the SAM scales relies on the assumption that emotional experiences can be decomposed into arousal and bipolar valence. The idea of the distinct core-affects valence and arousal (as first described by Russell, 1979, see Barrett and Bliss-Moreau, 2009, for a recent discussion) has found support in physiological (e.g. Bradley and Lang, 1994, for an extensive review see Bradley and Lang, 2000) and neuroimaging studies (Anders et al., 2004; Wilson-Mendenhall et al., 2013). The main critique on the assumption that valence is a bipolar dimension of affect is that it explicitly denies the existence of mixed feelings: Valence can range between sad and happy, but one cannot report to be happy and sad at the same time. Accordingly, the middle valence SAM is described as representing a “completely neutral, neither happy nor sad” feeling(Lang et al., 1997). Others have therefore suggested that emotional experiences are better described in terms of intensity of pleasant and unpleasant feelings (Ito et al., 1998; Kron et al., 2013). Studies using such unipolar rating scales of pleasantness and unpleasantness have demonstrated that participants do indeed report mixed feelings (Hemenover and Schimmack, 2007; Kron et al., 2013; Larsen and Green, 2013; Schimmack, 2001; Schimmack and Colcombe, 2007) and unique physiological markers of mixed feelings have been reported, too (Kreibig et al., 2013).

Recently, doubt has also been cast on the assumption that valence and arousal can be regarded as orthogonal components of core affect in self-reports. Kron and colleagues (2013) have demonstrated that whilst physiological responses tell valence and arousal apart, the same must

(14)

not be true for self-reported subjective experience: Their participants’ ratings of intensity of pleasant and unpleasant feelings could account very well for variance in arousal and bipolar valence ratings. As these authors regard pleasantness and unpleasantness to be unipolar measures of valence, they conclude that people only have limited abilities to tell valence and arousal components of emotional experiences apart. Whether this conclusion is justified remains debatable.

However, high correlations between aggregated un-/pleasantness ratings (sum and difference) and arousal and bipolar valence strongly suggest that there is a tight coupling between these components of self-reported emotional experiences. The essential difference between both scale types is that the use of unipolar pleasantness and unpleasantness scales gives insights into one component of emotional experience not assessable with bipolar valence scales: mixed feelings.

1.4.2 Consideration of further rating material characteristics

As the study considered here took place at a university offering courses in German and English language, it was conducted in both languages. Care was taken to ensure that instructions – oral as well as written ones – were equivalent in both languages. Use of the standardized SAM instructions (Lang et al., 1997) ensured comparability of instructions and labels of all rating dimensions (see also Kron et al., 2013 for labels and descriptions of unipolar scales). However, other studies using SAM ratings in other languages than English (Portuguese, German, Hun- garian and Spanish) have found that their results differed from the ones obtained with English study materials (De´ak et al., 2010; Dufey Dom´ınguez et al., 2011; Gr¨uhn and Scheibe, 2008; La- saitis et al., 2008; Ribeiro et al., 2005; Schmidtke et al., 2014). Specifically, these studies found a stronger negative linear correlation between arousal and bipolar valence ratings than originally reported for US-American samples (e.g. Bradley and Lang, 1994). One study conducted in German (Schmidtke et al., 2014) attributes this finding in part to the fact that “arousal” in English lacks the rather negative connotation of the German equivalents “Erregung” or “Aufre- gung”. This explanation seems too restricted to explain the diverging correlation pattern for numerous languages. The most common and encompassing explanation that has been given so far for the stronger negative linear association between arousal and bipolar valence ratings in non-US-American samples is that the concept of “arousal” in the US-American culture drives the curvilinear relation and is specific to the US. The study presented here may contribute to disentangling cultural and language-inherent differences in self-reports of arousal and bipolar valence, as English as well as German questionnaires were given to participants while all participants had a predominantly German cultural background.

(15)

1 INTRODUCTION

Another concern related to the explanation of deviance from the originally reported arousal- valence relation is that differences in rating procedures may lead to changes in self-reported emotions. One major difference between original norm rating sessions and more recent studies has for instance been found in the change from group sessions and projected pictures to individual sessions at a PC (see e.g. Gr¨uhn and Scheibe, 2008). Nonetheless, such potentially influential variations in study design are often not systematically assessed. One more example is that studies using SAM rating scales have remained astonishingly silent about the potential impact of the order in which participants are instructed to report different emotions.

Even though most studies at least provide information about stimulus and/or rating sequence (Backs et al., 2005; Gr¨uhn and Scheibe, 2008; Ito et al., 1998; Kron et al., 2013), evidence for ratings’ and results’ independence from these formal study characteristics is not given. Often, it seems that counter-balancing is regarded as sufficient precondition for ignoring order variations in study procedure. However, even though counter-balancing ensures that one may assess confounding effects, it does not guarantee that results obtained across the entire sample are independent of the counter-balanced variable. Effects may for example be present and large in one condition only, leading to the observation of a smaller effect across the entire sample.

Likewise, contradictory effects of similar size in either condition may cancel each other out. As there seems to be no explicit report of such potential order effects on SAM ratings so far, they will be considered in this thesis.

1.5 Purpose of this thesis

This thesis may not possibly answer the question which rating material is best for measuring self-reported emotional experiences. The question I want to pose is: What measure is best suited for assessing subjective emotional experiences given the specific stimuli and research questions? An encompassing answer does not only illustrate the relation between different types of scales as has recently been done (Kron et al., 2013). Rather, it has to be found out how the use of distinct rating scales affects results regarding a concrete research question. Therefore, this thesis has two main goals: 1) replicating studies on the differences and commonalities of unipolar un-/pleasantness ratings compared to arousal and bipolar valence ratings, and 2) assessing emotional experiences linked to the perception of help related picture content.

The pursuit of these aims was guided by the following questions: 1a) To what extent can bipolar valence ratings be explained by the difference between pleasantness and unpleasantness ratings? 1b) To what extent is variance in arousal ratings reflected in the sum of pleasantness

(16)

and unpleasantness ratings? 1c) Do participants report mixed feelings, and if so, do they relate to bipolar valence ratings? 2a) To what extent does need of help content affect people’s bipolar valence, arousal, pleasantness and unpleasantness ratings? 2b) How do these influences change for bird depictions? 2c) In how far does the depiction of a helping action lead to discernible emotional responses? Combining the two major research questions, a third one is posed: 3) Is the pattern of results obtained using arousal and bipolar valence scales comparable to those obtained on unipolar pleasantness and unpleasantness scales?

As the breadth of a thesis allows an in depth consideration of all possible factors influencing results, the above mentioned considerations are expanded by taking formal characteristics of measurement into account. Hence, the additional questions asked were: How does language influence the measurement of self-reported emotions and can language change the influences of picture content effects? To what extent can the order in which ratings shall be made influence mean ratings and can ratings’ order change the effects of picture content?

(17)

2 METHODS

2 Methods

2.1 Ethics approval

The study was given formal ethics approval by the Ethics Committee of the University of Konstanz with a decision from 31^st July 2013. It was also approved by the Dean of the Faculty of Society and Economics, Rhine-Waal University of Applied Science on October 1^st 2013.

All participants signed written informed consent according to the Declaration of Helsinki (see Appendix B).

2.2 Stimuli

For this Master’s thesis I continued to work with the NeoHelp stimulus set previously developed in my Bachelor’s thesis (Brielmann, 2014). This set of pictures was originally designed to investigate the perception of need of help. For the present purpose, the stimulus set was extended to depiction of active helping behavior and appropriate control stimuli. This extension was implemented for the ten situations of the original stimulus set that have previously been shown to result in the clearest discrimination between need of help and no need of help content in children (Brielmann and Stolarova, 2014). All pictures were black-and-white comic drawings of 800 x 800 px in size and were created using Adobe Illustrator version CS5 or later.

The complete stimulus set used for this study is comprised of two subsets, each focusing on one aspect of helping. The first subset, which will be referred to as “need of help” subset, was designed to investigate the perception of need of help. All of these pictures were created pairwise, one of them showing the actor in need of help, the counterpart showing him or her achieving his or her goal. There were at least two different need of help vs. no need of help picture pairs for each situation, one of a child and one of a bird. For all situations showing a child with an identifiable gender (not for toddlers) there were two pairs of child pictures varying in gender. Thus, there was a boy as well as a girl picture pair for seven situations, a picture pair of toddlers for three situations, and two corresponding picture pairs showing birds for each situation - a total of 54 pictures - in the “need of help” subset.

The second stimulus subset assessed the emotional perception of active helping behavior while controlling for effects of a general social context as given by two persons in a picture.

This series of pictures will be referred to as “social context” subset. In this subset, there were four different pictures varying in social context for each of the ten different situations: the first one was the no need of help picture of the boy or toddler alone from the “need of help”

(18)

subset (in the context of this subset referred to as “child alone”), the second one showed the adult alone (“adult alone”), the third showed the adult passively standing or kneeling besides the child (“social passive”), and the fourth showed the adult actively helping the child (“social helping”). The “social passive” pictures corresponded to the “child alone” pictures including the same adult figure as in the “adult alone” picture. The “social context” subset thus consisted of 40 pictures. The only picture category that belonged to both picture subsets was the one showing a boy or toddler alone without need of help. These ten pictures were shown once each – just as every other picture.

In sum, participants were shown 84 different pictures. An overview of all picture categories for one example situation is given in Figure 1. The complete set of stimuli is listed in Appendix A. All stimuli are publicly accessible at https://osf.io/jmsct/.

“need of help” subset

“social context” subset

adult alone social passive social helping

child boy child girl bird

need of help need of help need of help no need of help no need of help no need of help

A

Get ready to rate the next picture # 35

picture #35

Please rate the picture on both dimensions

preparation slide 5 s

stimulus slide 6 s

rating slide 10 s B

trial sequence

...

C

study procedure

oral and written instructions 2 example pictures

Questions?

first half: 42 pictures oral and written instructions

Questions?

second half: 42 pictures child alone

½ participants uses bipolar scales

other ½ uses unipolar scales

Participants switch scale type

Figure 1. Example stimuli, trial and study sequence. Stimuli for each of the ten situations were created for two subsets (A). The “need of help” subset (top) consisted of 27 need of help / no need of help picture pairs. For each of the ten situations, there was also a bird picture pair in the “need of help” subset. The “social context” subset (bottom) included 4 pictures with varying social contexts for each situation. Trial sequence was the same for all 84 pictures (B).

The study was divided in two halves (C).

(19)

2 METHODS

2.3 Design and procedure

The experiment reported here was absolved during a regular 90 minutes university lecture. Total testing time was about 60 minutes. The study was always conducted by a trained experimenter in the course’s language of instruction (see Table 2 for a list of course programs and languages).

A total of eight testing sessions with varying numbers of participants took place during the course of one month. Participants had the opportunity to discontinue their participation at any time.

Participants received rating booklets and a separate consent form which was collected after the general explanations of procedures and before the start of data collection. The rating booklets were subdivided into two parts, each of them started with written instructions (see Appendix B). Oral instructions as well as the opportunity to ask questions were given prior to each part. The first part of the session took approximately 15 minutes and was used to collect data regarding participants’ gender attributions to the 30 adult figures depicted in the “social context” subset. The results of this part of the study are beyond the scope of this thesis.

The second part of the testing session focused on participants’ subjective emotional ratings.

Participants were shown all 84 pictures, including the 30 pictures they had just seen in the gender attribution task that were dispersed among the novel pictures (see Appendix A for order of picture presentation, repeating pictures are highlighted). Oral instructions given to the participants also referred to the written instructions and examples of the rating scales provided in each testing booklet. Participants were instructed to rate the pictures according to their subjective emotional experience using the 9-point Liker scales illustrated by SAMs. For one half of the pictures each participant used the traditional SAM scales for the two dimensions bipolar valence (happy to sad) and arousal (calm to excited). Further on these rating scales will be referred to as “bipolar scales”. For the other half two unipolar scales for the dimensions pleasant and unpleasant feelings (absent to very strong) were used. These scales will be referred to as the “unipolar scales”. For the pleasantness and unpleasantness dimensions, the arousal SAMs illustrated both rating scales (see Appendix B). Color changes from dark to light blue (or vice versa) highlighted the change in rating scales’ type.

The order in which bipolar and unipolar rating scales were used was random and counter- balanced across participants. Thus, ratings on both unipolar and bipolar scales were obtained from the same population, allowing to directly relate mean ratings to each other. The order of arousal/valence and of pleasantness/unpleasantness dimensions was reversed once on each page.

As ratings for two different pictures were provided on one side (see Appendix B), it was also

(20)

counter-balanced across participants whether arousal or valence, pleasantness or unpleasantness ratings were asked for on the top of each page. The term “dimension order” refers to the factor describing which rating dimension was shown on top of each page. Approximately one half of participants rated arousal or pleasantness first on top of each page (odd trial numbers), and accordingly valence or unpleasantness first on the bottom of each page (even trial numbers).

The reverse was true for the other participants. Table 1 shows the distribution of dimension order. Rating booklets were printed in German as well as English language. Distribution of different booklet versions in each language was approximately equal for all scale and dimension order versions (see Table 1). The layout of one example booklet in English language is shown in Appendix B.

Table 1. Number of different booklet versions used in each language as well as by men and women.

Language Gender¹ Scale order Dimension order English German Men Women

First bipolar First arousal/pleasant 32 27 14 45

First valence/unpleasant 27 32 14 45

First unipolar First arousal/pleasant 33 28 18 43

First valence/unpleasant 33 30 18 43

Note. ¹ gender information was not provided by two participants.

For the use of the bipolar valence and arousal rating scales written and oral instructions were adapted from the original IAPS rating procedure (Lang et al., 1997). Only minimal changes to the instructions were made, mostly regarding the omission of dominance ratings. Instructions for the unipolar pleasantness and unpleasantness rating scales were adapted from the bipolar rating scales’ instructions, ensuring that all participants received equally detailed instruction for each rating scale. English instructions were then translated into German. Each participant received both instructions, one for bipolar, one for unipolar rating scales, just before he or she began to use this scale type. The complete study procedure is illustrated in Figure 1 C. At the beginning of the experiment two example pictures were provided. After the first half of pictures there was a short break during which the experimenter explained the change to the other rating scale type. Participants were given ample time to read through their instructions and the opportunity to ask questions before each half of the experiment.

Pictures were presented in regular lecture halls by projection. Trials followed the IAPS

(21)

2 METHODS

protocol (Lang et al., 1997) with the exception that rating time was shortened to 10 s as participants had to rate only two out of the three original dimensions (see Figure 1 B for an illustration of trial sequence). The order of picture presentation was pre-randomized while neither stimulus category (no-/need of help, child/adult, child/bird) nor situations were repeated directly one after another. All participants saw pictures in this same pre-randomized order listed in Appendix A.

2.4 Participants

A total of 278 undergraduate students of the Rhine-Waal University of Kleve took part in this study. Testing language was either English or German, depending on the language of the degree program of the participating students. Only data of participants with normal or corrected to normal vision was included in the analyses, leading to the exclusion of nine participants due to uncorrected vision impairments, and of 26 due to missing information regarding vision impairments. Data of one participant who accidentally received a booklet not corresponding to testing language was also excluded from analyses. Thus, data of 242 participants (M_age= 21.35, SD = 3.38) was analyzed. Table 2 lists the complete population characteristics, including age, gender, degree program, and language of study.

Mean age was similar for male and female participants,M = 21.25 yrs., 95% CI [20.74,21.75], for women, andM = 21.68 yrs., [20.83, 22.54], for men. The proportion of scale and dimension orders did not differ for men and women, as proportions’ CIs largely overlapped spanning from [.65, .86] to [.57, .79]. More women than men participated (73.33%). Gender distribution varied according to the field of studies: Relatively fewer women took part in international business, .58 [.42, .72], and international relations courses, .56, [.27, .81], than in the education course, .88, [.80, .94]. The proportion of male participants was higher for the group receiving booklets in English language, with 39%, [.31, .48], men in the English booklet sample and 13%, [.08, .20], in the German booklet sample. The discrepancy in gender distribution between the German and English booklet samples likely resulted from the fact that booklets were given to participants in the course language and courses taken were specific to participants’ field of study for which systematic differences in gender distribution were present. The majority of participants were German native speakers (91% of those providing information about native language). The proportion of German native speakers was slightly higher for the courses instructed in German, .97, [.92, .99], compared to English, .85, [.92, .99], but the proportion of English native speakers was not, .03, [.01, .08], for German, .08, [.04, .15], for English.

(22)

Table 2. Population characteristics.

N % male¹

Total 242 26.67

Handedness Right 208 27.18

Left 21 19.05

Both 7 42.86

Missing 6 50.00

Citizenship German 197 22.34

Other 31 37.50

Missing 14 57.14

Native language(s)² German 192 23.40

English 7 28.57

Only other 29 41.38

Missing 14 28.57

Field of study³ Alternative tourism (G) 31 16.67

Education (G) 81 11.11

International relations (E) 78 35.90 International business (E) 40 42.50

Other (6 E, 3G) 9 44.44

Missing (1E, 2G) 3 33.33

Booklet language English 125 39.20

German 117 13.04

Scale order First bipolar 118 23.73

First unipolar 124 29.51

Dimension order First arousal/pleasantness 120 26.67 First valence/unpleasantness 122 26.67 Note. ¹gender information was not provided by two participants,²participants could indicate more than one native language,

3letters in brackets indicate booklet language, E = English, G = German

(23)

2 METHODS

2.5 Data analyses

Pen-and-paper ratings for each participant were entered into an Excel table manually according to the scoring scheme illustrated in Figure 2. All data structuring and analyses were conducted using the GNU software R (version 3.0.2). Arousal, pleasantness and unpleasantness ratings were coded from 0 (calm, no (un)pleasant feelings) to 8 (excited, strong (un)pleasant feelings), bipolar valence ratings from -4 (sad) to 4 (happy). The intensity of mixed feelings for each picture was scored as the minimum rating given on pleasantness and unpleasantness scales for each picture as suggested by Schimmack (2001). All pictures received a minimum of 106 valid ratings on each dimension (see Appendix A for exact ns). Mean ratings and scores of mixed feelings were calculated across participants.

All analyses were based on effect sizes and estimation of confidence intervals. As the use of null-hypotheses testing has been shown to yield arbitrary and potentially misleading results (see e.g. Cumming, 2014; Kline, 2004), traditional p-values were not calculated. Interpreta- tion of CIs can be related to classical p-values in the following way: non-overlapping CIs of means indicate a “significant” difference, and CIs of effect sizes not overlapping 0 indicate a

“significant” effect.

The relation between average ratings on the different scales and dimensions was assessed using generalized linear models. Additionally, Pearson correlations were calculated for linear relations. To assess the effects of picture content, mean ratings and surrounding 95% CIs were calculated. Cohen’s d was used as an estimate of effect size. The variance of d was computed using the conversion formula: ((n1 +n2)/(n1∗n2) + 0.5∗d²/df)∗((n1 +n2)/df) (Cooper et al., 2009). In order to correct for biased population estimation of Cohen’sd, Hedges correction was used (Hedges and Olkin, 1985). Magnitude of Cohen’s d was judged based on suggestions by Cohen (1969).

Participants’ gender was considered as a factor in the main analyses as men and women have been shown to differ regarding various aspects of emotional experience (Bradley et al., 2001; Brody and Hall, 2008; Fischer, 2000). The effects of language on mean ratings were also assessed for men and women separately, as gender distribution was different for samples receiving German and English instructions. Influences of scale order were only considered regarding the relation between mean ratings within bipolar and unipolar scale types, as participants rated only half of the pictures on either scale type. Further influences of dimension order on relation between mean ratings per picture and content effects were assessed across gender, language and scale order, as distribution of these variables was equal across dimension orders (see Table 1).

(24)

strong pleasant

feelings

pleasantno feelings

strong unpleasant

feelings

unpleasantno feelings

minimum = 2 unpleasantness = 6

relates to

sum = 8 difference = -4

relates to

pleasantness = 2

calm excited

arousal = 7

happy sad

bipolar valence = -1

mixed feelings = 2

Bipolar valence & arousal scales

Unipolar pleasantness & unpleasantness scales

Aggregated unipolar

Figure 2. Example rating scales and scoring procedure for all rating dimensions. Light blue lines refer to ratings on bipolar valence and arousal scales, dark blue ones to pleasantness and unpleasantness scales. Gray lines refer to scores obtained by combining pleasantness and unpleasantness ratings. Raw rating scores are framed with rectangles, scores derived from pleasantness and unpleasantness ratings with ellipses. Solid lines indicate direct use of raw scores, dashed ones inferences. Note that participants only used either bipolar valence and arousal scales or pleasantness and unpleasantness scales for each picture at a time.

Pre-analyses showed that the pictures’ sequence position did not influence ratings on any of the four rating dimensions, all |r|(82)≤0.19. Also, the gender of the depicted child did not affect ratings on either dimension, as even the largest difference’s CI overlapped zero,d= 0.50, 95% CI [−0.11, 1.11]. No interaction of the depicted child’s gender and participants’ gender (own-gender effect) emerged; the largest difference between boy and girl depictions was present in male participants, but this difference between ratings for boy and girl pictures was still likely

(25)

2 METHODS

to point into either direction, d = 0.53, [−0.37, 1.43]. Thus, further analyses were conducted without consideration of presentation sequence or gender of the depicted child.

(26)

3 Results

3.1 Relation between ratings on different scales and dimensions

3.1.1 Strong relations between ratings within and across scale types

The relation between mean arousal and valence ratings per picture across the stimulus set is illustrated in Figure 3. A quadratic relation between bipolar valence and arousal ratings, as often reported in the classic literature, was evident, r(82) =.60 , 95% CI [.44, .72], R²_adjusted= 0.35. However, the linear relation between those scales was just as strong, r(82) = −.66, [−.77, −.52], and accounted for 43% of the variance.

−4 −3 −2 −1 0 1 2 3 4

0 1 2 3 4 5 6 7 8

Bipolar valence

Arousal

y = 2.37 + 0.37 × x² R²_adjusted = .35

r = -.66

Figure 3. Relation between bipolar valence and arousal ratings. Each data point corresponds to mean bipolar valence and arousal ratings for one picture. The regression line was created using MatLab LSD.

Mean ratings of pleasantness and unpleasantness per picture showed a strong negative linear correlation (see Figure 4), r(82) = −.87, 95% CI [−.92, .81], R²_adjusted = 0.76, illustrating that participants understood the antagonistic relation between both rating dimensions.

All participants used bipolar valence and arousal scales for one half of the stimuli, pleasantness and unpleasantness scales for the other half. Thus, means on both scale types can be directly related to another, as they stem from the same population.

First, the extent to which variance in bipolar valence ratings can be inferred from the difference between pleasantness and unpleasantness ratings was investigated. A nearly perfect

(27)

3 RESULTS

0 1 2 3 4 5 6 7 8

Pleasantness

Unpleasantness

r = -.87

Figure 4. Relation between pleasantness and unpleasantness ratings. Each data point corresponds to mean bipolar valence and arousal ratings for one picture. The regression line was created using MatLab LSD.

linear correlation (see Figure 5 B), r(82) = .96, 95% CI [.94, .98], accounted for 92 % of variance in bipolar valence ratings. It can therefore be assumed that participants employed the two types of scales similarly and that the information inherent in bipolar valence ratings can be inferred almost completely by calculating the difference between unipolar pleasantness and unpleasantness ratings.

Second, the amount to which the overall intensity of pleasant and unpleasant feelings could account for arousal ratings was assessed. A high proportion of variance in arousal ratings, i.e. 56%, could be explained by this aggregation of pleasantness and unpleasantness ratings by means of a strong linear correlation (see Figure 5 A), r(82) =.75, 95% CI [.64, .83]. In sum, these findings suggest an overall strong association between participants’ rating behavior on bipolar and unipolar scales.

3.1.2 Modulation of rating dimensions’ relations by language and participants’

gender

As gender distribution differed between samples receiving booklets in English and German, the potential impact of language and gender on the relations between mean ratings on different scale dimensions was assessed jointly. The detailed results are displayed in Table 3. The pattern of differences in scale relations according to language was different for men and women. For

(28)

−6 −4 −2 0 2 4 6

−4

−3

−2

−1 0 1 2 3 4

Pleasant − Unpleasant

Bipolar valence

r = .96

3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

Pleasant + Unpleasant

Arousal

r = .75

A B

Figure 5. Relation between aggregated unipolar ratings and arousal (A) as well as bipolar valence(B). Each data point corresponds to mean ratings for one picture. Regression lines were created using MatLab LSD.

women, the negative linear relation between arousal and bipolar valence ratings was stronger if German rather than English language was used throughout the study. For men the strength of the antagonistic relation between pleasantness and unpleasantness ratings was higher if the study was conducted in German rather than English.

Furthermore, differences between men and women in ratings’ relations were evident independent of study language. Pleasantness and unpleasantness ratings were related to each other more strongly for women than for men. This was true for the direct correlation between pleasantness and unpleasantness as well as the extent to which bipolar valence ratings could be inferred from the difference between unpleasantness and pleasantness ratings (see bold values in Table 3). For those participants receiving English instructions and material, inference of arousal ratings from the sum of pleasantness and unpleasantness ratings was also considerably diminished for men, i.e. nearly absent, compared to women. A similar trend was also observed in the sample receiving German instructions and materials, but possibly due to the small number of men in this sample (N = 15), correlations’ CIs still overlapped, albeit to a small amount only. In sum, gender differences point towards women reporting pleasantness and unpleasantness as being more antagonistic and showing stronger equivalence between ratings on unipolar and bipolar rating scales.

(29)

3 RESULTS

Table 3. Strength of relation between valence, arousal, pleasantness (P) and unpleasantness (U) for women (top) and men (bottom) participating in the English (left) and German (right) study version.

English German

Gender Relation r 95% CI R²_adjusted r 95% CI R²_adjusted

Women Valence ∼arousal −.40 [−.57 / −.21] .15 −.79 [−.86 / −.69] .62 Valence² ∼ arousal .53 [.36 / .67] .27 .54 [.37 / .68] .29

P ∼U −.84 [−.89 / −.76] .70 −.92 [−.94 / −.87] .84

P + U ∼ arousal .74 [.62 / .82] .54 .70 [.57 / .79] .48

P −U ∼ valence .93 [.90 / .96] .87 .96 [.94 / .97] .92

Men Valence ∼arousal −.46 [−.61 / −.27] .20 −.62 [−.74 / −.47] .38 Valence² ∼ arousal .55 [.38 / .68] .30 .55 [.38 / .68] .30

P ∼U −.39 [−.56 / −.19] .14 −.74 [−.83 / −.63] .55

P + U ∼ arousal .20 [−.01 / .40] .03 .43 [.23 / .59] .17

P −U ∼ valence .56 [.66 / .84] .59 .52 [.61 / .81] .52

Note. Effect sizes whose CIs indicate a difference in relation strength between languages are highlighted with gray shading. Effect sizes whose CIs indicate a difference in relation strength between men and women are highlighted in bold.

3.1.3 Independence of rating dimensions’ relations from scale and dimension order

The relation between mean ratings of dimensions of one scale type, i.e. between arousal and bipolar valence as well as between pleasantness and unpleasantness ratings, could be assessed independently for each out of the four different booklet versions (see Table 1). Correlations between mean arousal and valence ratings, between arousal and square valence ratings, as well as between mean pleasantness and unpleasantness ratings for either booklet version did neither differ from each other nor from the overall results described above. The largest tendency away from the relations reported across booklet versions was found for the quadratic relation between valence and arousal ratings, with r=.41, 95% CI [.13, .64], and hence still largely overlapping CIs for both results.

All participants rated stimuli in the same pre-randomized order and the use of bipolar and unipolar rating scale types was changed after half of the experiment. Accordingly, relations between mean ratings on aggregated unipolar and bipolar rating scales may only be assessed by combining data of participants using first bipolar and such using first unipolar rating scales. As for the relations within one scale type, relations between scale types were not different according

(30)

to which rating dimension was shown on top of each page. The most dissimilar correlation to the overall results was the relation between the sum of pleasantness and unpleasantness ratings and arousal regarding data of participants who first rated valence or unpleasantness on top of each page,r =.69, 95% CI [.56, .79].

3.2 Mixed feelings

3.2.1 Unipolar ratings reveal presence of mixed feelings

As an indicator for the intensity of mixed feelings, the minimum pleasantness or unpleasantness rating for each picture was used (Schimmack, 2001, see Figure 2). Scores of mixed feelings were then averaged per picture. Across all pictures, the mean intensity of mixed feelings was about one rating step greater than zero, M = 1.08, [1.03, 1.14], with a narrow CI indicating only little uncertainty for this finding. Thus, participants reported to have antagonistic pleasant and unpleasant feelings when viewing one picture. An alternative explanation for this finding would be that participants simply avoided the extreme ends of the rating scale (Guilford, 1954).

To estimate the plausibility of this explanation, the distribution of minimal pleasantness and unpleasantness ratings was inspected. More than two thirds (6767 out of the 9818 ratings with valid pleasantness and unpleasantness values) were zero or one. Hence, participants did not show an aversion to use the extreme ends of the unipolar rating scales.

There was a strong linear correlation between the difference between pleasantness and unpleasantness ratings and bipolar valence (see Figure 5 B). It implies that bipolar valence can be regarded as a subtotal of participant’s pleasant and unpleasant feelings. If this assumption is correct, mixed feelings should predominantly emerge when bipolar valence is close to zero.

Indeed, the intensity of mixed feelings showed a quadratic association with bipolar valence ratings (see Figure 6), r(82) =−.69, 95% CI [−.79, −.55], R²_adjusted =.47, with a peak around bipolar valence ratings of zero. This finding supports the interpretation that neutral bipolar valence ratings can arise because of co-occurring pleasant and unpleasant feelings. Moreover, this systematic relation renders the possibility that mixed feelings arose from a bias away from the rating scales’ extremes very unlikely. Taken together these results replicate previous findings (Kron et al., 2013). They also document the existence of an emotional state, i.e. mixed feelings, that cannot be assessed on bipolar valence scales.

(31)

3 RESULTS

−4 −3 −2 −1 0 1 2 3 4

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

R²_adjusted= .47

Bipolar valence

Mixed feelings

y = 1.24 - 0.11 × x²

Figure 6. Relation between bipolar valence and intensity of mixed feelings. Each data point corresponds to the mean intensity of mixed feelings and mean bipolar valence ratings for one picture. The horizontal line represents the mean intensity of mixed feelings (1.08), the shaded area surrounding it its 95% CI.

3.2.2 Mixed feelings’ relation to bipolar valence is sensitive to language but not dimension order

As shown in detail in Table 4, for those participants doing the study in German, women’s but not men’s square bipolar valence ratings were related to the intensity of mixed feelings.

Moreover, the relation between square bipolar valence ratings and mixed feelings was stronger for women completing the study in German rather than English.

Dimension order had no such an impact on the quadratic relation emerging between square bipolar valence ratings and the intensity of mixed feelings. Data of participants who first rated valence or unpleasantness on top of each page, r = −.58, 95% CI [−.71, −.42], as well as of those rating arousal or pleasantness first, r=−.64, [−.75, −.50], yielded comparable results to the main analyses.

3.3 “Need of help” stimulus subset

3.3.1 Need of help is rated sadder and more arousing

The picture pairs in the “need of help” stimulus subset allow the assessment of need of help specific changes in self-reported emotion, as ratings from need of help depictions can be compared

(32)

Table 4. Correlations between square bipolar valence and intensity of mixed feelings according to gender and study language.

English German

Gender r 95% CI R²_adjusted r 95% CI R²_adjusted Women −.41 [−.58 / −.22] .16 −.70 [−.80 / −.57] .49 Men −.33 [−.51 / −.12] .10 −.12 [−.33 / .09] .00

Note. Effect sizes whose CIs indicate a difference in relation strength between languages are highlighted with gray shading. Effect sizes whose CIs indicate a difference in relation strength between men and women are highlighted in bold.

to highly similar control pictures (see Figure 1 for an example situation and Appendix A for a complete list of stimuli). Picture pairs showing a bird instead of a child in identical (human- like) situations added a second picture category, decreasing potential expectation effects and partially allowing generalization of need of help effects to non-humans. Effect sizes and their 95% CIs for all differences regarding the different picture categories are listed in Table 5.

Need of help picture content had a large effect on bipolar valence ratings. Size and presence of this effect were not affected by whether a child or a bird was depicted. However, the need of help effect was medium for men and large for women, with non-overlapping CIs indicating that the effect size meaningfully differed for men and women (see Figure 7 A and compare row 4 to row 5 in Table 5). Second, pictures of birds were rated higher in valence than pictures of children (see Figure 7 A and column 1 of Table 5), meaning that birds were given a mean valence rating closer to zero. It has to be considered that the large confidence interval surrounding this effect suggests that the magnitude for bird depiction’s effect is less certain than for the effect of need of help. Moreover, the size of difference was considerably smaller between child and bird pictures than between need and no need of help depictions. All remaining d-values’ CIs clearly overlapped zero, indicating that the remaining differences cannot be considered meaningful.

Regarding arousal ratings, need of help depictions were rated more arousing than no need of help depictions (see Figure 7 B). Just as for valence ratings, this effect was large with a confidence well above 95%. As opposed to valence ratings, however, CIs of Cohen’s d for need of help’s effects overlapped to a big extent for men and women. Hence, arousal ratings were similarly heightened for need of help depictions in men and women. Participants’ gender in- fluenced arousal ratings depending on need of help content; only for need of help pictures,

It matters how you ask : Emotional ratings of help-related picture content

Abschlussarbeit zur Erlangung des akademischen Grades eines Master der Naturwissenschaften (MSc.)