• Keine Ergebnisse gefunden

The German boundary tones: categorical Perception, perceptual magnets, and the perceptual reference space

N/A
N/A
Protected

Academic year: 2021

Aktie "The German boundary tones: categorical Perception, perceptual magnets, and the perceptual reference space"

Copied!
271
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

The German boundary tones:

Categorical Perception, Perceptual Magnets,

and the Perceptual Reference Space

Von der Philosophisch-Historischen Fakult¨at der Universit¨at Stuttgart zur Erlangung der W¨urde eines Doktors der Philosophie (Dr. phil.)

genehmigte Abhandlung

Vorgelegt von

Katrin Schneider

aus Potsdam

Hauptberichter: Prof. Dr. Grzegorz Dogil

1. Mitberichter: Prof. Dr. Bernd M¨obius

2. Mitberichter: Prof. Dr. David House

Tag der m¨undlichen Pr¨ufung: 18.06.2012

Institut f¨ur Maschinelle Sprachverarbeitung der Universit¨at Stuttgart 2012

(2)
(3)

Danksagung

Ich m¨ochte mich an dieser Stelle bei den Menschen bedanken, die mich beim Erstellen dieser Arbeit so uneigenn¨utzig unterst¨utzt haben.

Mein erster Dank gilt Prof. Grzegorz Dogil, der es mir erm¨oglicht hat, am IMS durch die Mitarbeit in Projekten Fuß zu fassen, meine Forschung im Gebiet der Sprachperzeption zu starten. Er war es auch der bereits fr¨uhzeitig darauf dr¨angte, diese Experimente als Dissertation zu ver¨offentlichen.

Ein besonderer Dank gilt Prof. Bernd M¨obius, nicht nur f¨ur seine großartige fachliche Beratung, sondern auch f¨ur seine unersch¨utterliche Ruhe und Geduld. Er hatte stets Zeit f¨ur meine Fragen, hat mit seinen Kommentaren und Hinweisen dazu beigetragen, dass ich meine Gedanken besser strukturieren konnte und hat mir die so wichtige Zeit zum Schreiben gelassen.

Ein weiterer Dank gilt meinem externen Gutachter Prof. David House, der mir wertvolle Tipps und Hinweise in Bezug auf die Auswertung meiner Daten gegeben und meine Ideen mit mir diskutiert hat.

Weiterhin m¨ochte ich mich bei meinen Kolleginnen und Kollegen der Experi-mentellen Phonetik des IMS f¨ur die aufgeschlossene und nette Arbeitsatmosph¨are und die Hilfsbereitschaft bedanken, besonders aber bei meiner B¨urokollegin Antje Schweitzer. Sie hatte immer ein offenes Ohr f¨ur Fragen meinerseits, egal ob zu fachlichen oder technischen Problemen, und sie war auch eine moralische St¨utze, wenn die Arbeit mal ¨uberhand nahm.

Ein großer Dank gilt nat¨urlich auch meiner Familie, allen voran meinem Mann und meinen Kindern, die es mir mir nachgesehen haben, dass ich mich an diversen Abenden, Wochenenden und sogar im Urlaub mit dieser Arbeit besch¨aftigt habe. Auch bei meinen Eltern m¨ochte ich mich bedanken; bei meiner Mutter f¨ur das schnelle und sehr sorgf¨altige Korrekturlesen dieser Arbeit, trotz eigener Verpflich-tungen; und bei meinem Vater, weil er mir immer wieder leise aber sehr bestimmt in den Ohren lag, diese Arbeit endlich fertig zu stellen.

Die Experimente dieser Arbeit wurden durch die DFG im Rahmen des Pro-jektes “Prosodieproduktion” sowie des SPP 1234 “Sprachlautliche Kompetenz: Zwischen Grammatik, Signalverarbeitung und neuronaler Aktivit¨at” gef¨ordert. Ohne diese Unterst¨utzung w¨are die Arbeit so nicht m¨oglich gewesen.

(4)
(5)

Contents

List of Abbreviations i

English Summary ix

Deutsche Zusammenfassung xiii

1 Introduction 1 2 Theoretical Background 7 2.1 Categorization . . . 7 2.1.1 Speech categories . . . 8 2.1.2 Prosodic Categories . . . 10 2.1.3 Perceptual mechanisms . . . 12

2.2 A Model of a Working Memory . . . 19

2.3 Intonation models . . . 21 2.3.1 Phonological models . . . 21 2.3.2 Acoustic-phonetic models . . . 22 2.3.3 Perception-based models . . . 22 2.4 Exemplar Theory . . . 23 2.4.1 Stimulus perception . . . 23 2.4.2 Stimulus production . . . 24

2.4.3 Categories and their properties according to Exemplar Theory 26 2.4.4 Adopting Exemplar Theory for prosody research . . . 28

2.5 Boundary tone categories in German . . . 29

3 Experimental Designs 31 3.1 Categorical Perception Paradigm . . . 32

3.1.1 The general CP test design . . . 32

3.1.2 Modifications of the CP test design . . . 33

3.1.3 The Inter-stimulus Interval (ISI) . . . 35

3.1.4 The Order-of-Presentation Effect . . . 36

3.1.5 Adapting the CP paradigm to prosody . . . 37

3.1.6 Controversial Categorical Perception . . . 39 5

(6)

3.1.7 Alternative idea: imitation task to test for CP . . . 43

3.2 Perceptual Magnet Effect . . . 44

3.2.1 The PME test design . . . 45

3.2.2 Arguments against PME . . . 47

3.2.3 Alternative interpretation of PME . . . 48

3.3 Interpreting Reaction time . . . 50

3.3.1 Reaction time, CP and PME . . . 51

3.3.2 Reaction time and continuous perception . . . 53

3.4 Signal Detection Theory . . . 53

3.5 Experimental stimuli . . . 55

3.5.1 Stimulus selection . . . 56

3.5.2 The ERB scale . . . 59

3.5.3 Stimulus preparation . . . 62

4 Perception of Boundary Tones in German without contextual information 67 4.1 CP of German boundary tones . . . 67

4.1.1 Stimulus selection and preparation . . . 67

4.1.2 Participants and experimental procedure . . . 69

4.1.3 Identification results . . . 70

4.1.4 Discrimination results . . . 72

4.1.5 Discussion . . . 74

4.2 PME in German boundary tones . . . 80

4.2.1 Preparation of the stimulus continuum . . . 80

4.2.2 Participants and experimental procedure . . . 80

4.2.3 Identification results . . . 84

4.2.4 Results of the Goodness rating tasks . . . 86

4.2.5 Discrimination results . . . 88

4.2.6 Discussion . . . 90

4.3 CP and PME using controlled stimuli . . . 95

4.3.1 Creating the stimulus continuum . . . 96

4.3.2 Participants and experimental procedures . . . 98

4.3.3 Reaction time measures . . . 99

4.3.4 Identification results . . . 100

4.3.5 Results of the goodness rating . . . 103

4.3.6 Discrimination results . . . 107

4.3.7 Discussion . . . 125

4.3.8 Perceptual reference space . . . 130

(7)

CONTENTS 7 5 Influence of contextual information on the perception of

Bound-ary Tones in German 135

5.1 Stimulus preparation . . . 135

5.1.1 A pretest for context stimuli . . . 136

5.1.2 Context conditions and stimulus continua . . . 137

5.2 Participants and experimental procedures . . . 139

5.2.1 Identification test . . . 140

5.2.2 Goodness rating task . . . 140

5.2.3 CP discrimination test . . . 141

5.2.4 PME discrimination test . . . 142

5.3 Results of the Identification . . . 143

5.3.1 General results . . . 143

5.3.2 Context-dependent results . . . 144

5.3.3 Discussion . . . 146

5.4 Results of the Goodness Rating . . . 148

5.4.1 Rating results in the low boundary tone category . . . 148

5.4.2 Rating results in the high boundary tone category . . . 153

5.5 Results of the Discrimination . . . 157

5.5.1 Analyzing CP discrimination data . . . 157

5.5.2 Analyzing PME discrimination data . . . 174

5.6 Creating a Perceptual Reference Space . . . 191

5.7 Summary . . . 193

6 Influence of the speaker’s sex 197 6.1 Stimulus preparation . . . 197

6.2 Participants and experimental procedures . . . 198

6.3 Results of the Identification . . . 199

6.4 Results of the Goodness rating . . . 202

6.4.1 Extracting a prototype and a non-prototype in the low bound-ary tone category . . . 203

6.4.2 Extracting a prototype and a non-prototype in the high boundary tone category . . . 204

6.4.3 Summary of the goodness rating tasks . . . 205

6.5 Results of the Discrimination . . . 206

6.5.1 Analyzing CP discrimination data . . . 206

6.5.2 Analyzing PME discrimination data . . . 214

6.6 Perceptual Reference Space . . . 220

6.7 Summary . . . 222

(8)
(9)

List of Abbreviations

CP Categorical Perception

ERB Equivalent Rectangual Bandwidth F

0 Fundamental Frequency

NP Non-Prototype

NP

Q Non-Prototype of the question category

NP

S Non-Prototype of the statement category

P Prototype

P

Q Prototype of the question category

P

S Prototype of the statement category

PME Perceptual Magnet Effect

PP Prepositional Phrase

PSOLA Pitch Synchronous Overlap Add Method RT(s) Reaction Time(s)

SDT Signal Detection Theory

(10)
(11)

List of Figures

2.1 Stylization of new spectral information in a CVC syllable . . . 12

2.2 F0 contour perception in the model of optimal tonal perception . 12 2.3 Continuous Perception of two hypothetical categories . . . 14

2.4 Categorical Perception of two hypothetical categories . . . 14

2.5 Schematic representation of a Perceptual Magnet Effect . . . 18

2.6 Illustration of the actual acoustic locations of stimuli and their perceptions by American English and Japanese listeners . . . 19

2.7 Model of the Working Memory . . . 20

2.8 Illustration of entrenchment vs. no entrenchment . . . 25

3.1 Comparison of Bark scale vs. ERB scale . . . 61

3.2 Manipulation of the German target phrase for a CP test . . . 63

3.3 Manipulation of the German target phrase for a PME test . . . . 65

4.1 Average identification results . . . 71

4.2 Individual identification functions . . . 71

4.3 Average discrimination results . . . 73

4.4 Individual crossovers vs. discrimination peaks . . . 73

4.5 Crossovers vs. discrimination peaks (gender results) . . . 75

4.6 Influence of phonetic training on CP . . . 75

4.7 Average Identification . . . 84

4.8 Perfect identification . . . 84

4.9 Gender-specific rating differences (H% category) . . . 88

4.10 Gender-specific ratings (H% category, exclusion of 4 subjects) . . 88

4.11 Hit rates: PS vs. N PS . . . 89

4.12 λCenter values: PS vs. N PS . . . 89

4.13 Hit rates: PQ vs. N PQ . . . 90

4.14 λCenter values: PQ vs. N PQ . . . 90

4.15 Stimulus manipilation for PME test . . . 97

4.16 Average identification . . . 101

4.17 Average identification & RT values . . . 101

4.18 Gender differences in L% identification . . . 103 iii

(12)

4.19 Gender-specific RTs in identification . . . 103

4.20 Ratings & RTs in the L% category . . . 105

4.21 RTs in the L% category (males vs. females) . . . 105

4.22 Ratings & RTs in the H% category . . . 106

4.23 RTs in the H% category (males vs. females) . . . 106

4.24 AB vs. BA vs. average discrimination . . . 109

4.25 λCenter values vs. RT . . . 109

4.26 Discrimination (“high” performers) . . . 111

4.27 Discrimination (“low” performers) . . . 111

4.28 Discrimination (gender differences) . . . 112

4.29 λCenter values (gender differences) . . . 112

4.30 Identification vs. discrimination (L% category) . . . 114

4.31 Identification vs. discrimination (neither-nor ) . . . 114

4.32 Crossover vs. λCenter (“high” vs. “low” performers) . . . 115

4.33 RTs vs. λCenter values (gender-specific) . . . 115

4.34 λCenter values: PS vs. N PS surrounding . . . 117

4.35 RTs: PS vs. N PS surrounding . . . 117

4.36 λCenter values: PS vs. N PS (“high” performers) . . . 120

4.37 λCenter values: PS vs. N PS (“low” performers) . . . 120

4.38 PQ vs. N PQ discrimination . . . 122

4.39 RTs: PQ vs. N PQ discrimination . . . 122

4.40 PQ vs. N PQ discrimination (“high” performers) . . . 124

4.41 PQ vs. N PQ discrimination (“low” performers) . . . 124

4.42 PS: Perceptual reference space . . . 131

4.43 N PS: Perceptual reference space . . . 131

4.44 PQ: Perceptual reference space . . . 132

4.45 N PQ: Perceptual reference space . . . 132

5.1 Average identification (all contexts) . . . 144

5.2 Gender-specific RTs (all contexts) . . . 144

5.3 Context-specific identification . . . 145

5.4 Gender-specific identification (H% context) . . . 145

5.5 Ratings vs. RTs (low boundary tone category) . . . 150

5.6 Gender-specific RTs (low boundary tone ratings) . . . 150

5.7 Context-specific ratings (low boundary tone category) . . . 151

5.8 Context-specific RTs (low boundary tone category) . . . 151

5.9 Ratings vs. RTs (high boundary tone category) . . . 154

5.10 Gender-specific RTs (high boundary tone ratings) . . . 154

5.11 Context-specific ratings (high boundary tone category) . . . 155

5.12 Context-specific RTs (high boundary tone category) . . . 155

(13)

LIST OF FIGURES v

5.14 d’ values of the discrimination results . . . 159

5.15 Discrimination (“high” vs. “low” performers) . . . 160

5.16 Gender-specific discrimination (“high” performers) . . . 160

5.17 Context-specific discrimination . . . 163

5.18 Context-specific RTs in discrimination . . . 163

5.19 Gender-specific discrimination (Wh L% context) . . . 164

5.20 Discrimination (“high” vs. “low” performers, L% context) . . . . 164

5.21 Predicted vs. obtained discrimination (all contexts) . . . 169

5.22 Predicted vs. obtained discrimination (H% context) . . . 169

5.23 Crossovers vs. discrimination peaks (all contexts) . . . 172

5.24 Crossovers vs. discrimination peaks (context-specific) . . . 172

5.25 Gender-specific crossovers vs. discr. peaks (context-specific) . . . . 173

5.26 Hit rates: PS vs. N PS (all contexts) . . . 175

5.27 λCenter values: PS vs. N PS (all contexts) . . . 175

5.28 λCenter values: PS vs. N PS (“high” performers, all contexts) . . . 177

5.29 λCenter values: PS vs. N PS (“low” performers, all contexts) . . . . 177

5.30 λCenter values: gender differences around N PS (all contexts) . . . 178

5.31 λCenter values: gender differences around PS (all contexts) . . . 178

5.32 λCenter values: PS surrounding (context-specific) . . . 180

5.33 λCenter values: N PS surrounding (context-specific) . . . 180

5.34 λCentervalues: PSsurrounding (L% context; high vs. low performers)181 5.35 λCenter values: N PS surrounding (Wh L% context; high vs. low performers) . . . 181

5.36 Hit rates: PQ vs. N PQ (all contexts) . . . 182

5.37 λCenter values: PQ vs. N PQ (all contexts) . . . 182

5.38 PQ hit rates: gender differences (all contexts) . . . 184

5.39 N PQ hit rates: gender differences (all contexts) . . . 184

5.40 PQ vs. N PQ discrimination (“low” performers, all contexts) . . . . 185

5.41 PQ vs. N PQ discrimination (“high” performers, all contexts) . . . 185

5.42 PQ vs. N PQ discrimination (contexts separated) . . . 186

5.43 PQ vs. N PQ discrimination (“high” performers, Wh L% context) . 188 5.44 PQ vs. N PQ discrimination (“low” performers, L% context) . . . 188

5.45 P vs. N P discrimination (both boundary tones) . . . 189

5.46 P vs. N P λcenter values (both boundary tones) . . . 189

5.47 Perceptual reference space (low boundary tone category) . . . 192

5.48 Perceptual reference space (high boundary tone category) . . . 192

6.1 L% identification and RTs . . . 200

6.2 Gender differences in L% identification . . . 200

6.3 L% ratings vs. RTs . . . 203

(14)

6.5 H% ratings vs. RTs . . . 204

6.6 Gender-specific RTs in H% ratings . . . 204

6.7 Discrimination performance . . . 207

6.8 Discrimination performance (excluding “low” performers) . . . 207

6.9 Gender-specific RTs (excluding “low” performers) . . . 209

6.10 Gender-specific discrimination (excluding “low” performers) . . . 209

6.11 Crossover vs. discrimination peak (one-peak subjects) . . . 212

6.12 Crossover vs. discrimination peak (one-peak “high” performers) . 212 6.13 λCenter values: PS vs. N PS . . . 215

6.14 PS: gender-specific discrimination . . . 215

6.15 λCenter values: PS vs. N PS (“high” performers) . . . 216

6.16 λCenter values: PS vs. N PS (“low” performers) . . . 216

6.17 Discrimination: PQ vs. N PQ . . . 218

6.18 Discrimination: PQ vs. N PQ (male subjects) . . . 218

6.19 L% category: Perceptual reference space . . . 221

(15)

List of Tables

4.1 ERB vs. Hertz values for CP manipulation . . . 68

4.2 ERB vs. Hertz values for PME manipulation . . . 81

4.3 Boundary tone distribution in the SmartWeb corpus . . . 93

4.4 Mean RTs in identification . . . 100

5.1 RTs for each context separately . . . 149

5.2 Context-specific RTs in the high boundary tone rating . . . 153

5.3 Gender differences in discrimination (all contexts) . . . 161

(16)
(17)

English Summary

This thesis experimentally analyzes the perception of prosodic categories in Ger-man, using the two German boundary tones L% and H% postulated by German phonology. These two boundary tone categories were selected because they con-stitute the least disputed tonal contrast. In many languages, in German as well, the contrast between the low (L%) and the high (H%) boundary tone corresponds to a contrast in sentence mode. The low boundary tone is interpreted as a state-ment and the high boundary tone as a question. For all experistate-ments presented in this thesis it is hypothesized that the different perception of L% and H% as state-ment versus question, respectively, can be attributed to a contrast between two prosodic categories, i.e. to Categorical Perception. The basis for this hypothesis is the observation that the sentence mode of a syntactically ambiguous utterance can only be determined by the height of its boundary tone.

Prosody has linguistic as well as paralinguistic functions (Clark, Yallop, and Fletcher, 2007), and fundamental frequency, intensity and duration are inter-preted as the main correlates of prosody. However, as in natural speech prosodic elements are always uttered simultaneously to the segmental information, the investigation of the purely linguistic functions of prosody is problematic. Fur-thermore, the acoustic correlates of prosody may be used to convey paralinguis-tic information as well as linguisparalinguis-tic ones. However, the existence of prosodic categories could already be confirmed (Bruce, 1977; Clements and Ford, 1979), and prosodic categories are postulated to motivate and explain tonal events in utterances (Pierrehumbert, 1980).

Assuming the existence of the two proposed boundary tone categories two experimental designs that can be used to confirm categories, perceptual differ-ences inside a category or perceptual differdiffer-ences between categories are presented. These two designs are the test for the Categorical Perception (CP) (Repp, 1984) and the test for the Perceptual Magnet Effect (PME) (Kuhl, 1991). Originally, both designs were developed to examine perceptual differences in the segmental domain, especially for the evaluation of phoneme categories. Categorical Percep-tion is confirmed when the boundary between these two categories corresponds to the point at which the discrimination performance between two adjacent stim-uli is best. If for two speech events the Categorical Perception test is successful

(18)

then these two events will be confirmed as being categories of the respective lan-guage. The CP test was already successfully adapted to prosodic categories (cf. e.g. Kohler (1987)). However, so far the test for a Perceptual Magnet Effect was only used for segmental categories. A Perceptual Magnet Effect includes a warp-ing of the perceptual space towards a prototype of the respective category. Such a warping does not occur towards a non-prototype of the same category(Kuhl, 1991; Kuhl and Iverson, 1995; Iverson and Kuhl, 1995). The result of the warping is a significantly lower discrimination performance around the prototype, i.e. the prototype is not or only hard to discriminate from a adjacent stimulus. Such a warping is not found around a non-prototype, although the acoustic difference be-tween a stimulus and the non-prototype is comparable to the acoustic difference between a stimulus and the prototype.

For the analyses and the interpretation of the experimental results the Signal Detection Theory (SDT) and the Exemplar Theory are used. Signal Detection Theory (Wickens, 2002) postulates that despite similar auditory abilities subjects may differ in their perceptual results because of their individual response crite-rion. With respect to speech perception and production the Exemplar Theory (cf. e.g. Johnson (1997), Pierrehumbert (2001b; 2001a), Goldinger (1996)) is one of the most successful theories of the last decade. Exemplar Theory proposes that listeners store their perceived instances of speech events in exemplar clouds located in their perceptual space, and that these instances are stored with much phonetic detail. During speech production, the speaker uses these clouds of simi-lar exempsimi-lars to produce an instance of a speech event. Thus, speech perception and production are inseparably connected. The more exemplars are stored the more stable a speech category will get. Only stable categories can develop a category center and a Perceptual Magnet Effect (Lacerda, 1995).

The basis for the experiments that will be presented in this thesis is the con-firmation of the Categorical Perception of the low and the high boundary tone in Dutch (Remijsen and van Heuven, 1999). These two boundary tones correspond to the sentence mode interpretations statement and question, respectively. As this tonal contrasts exists in German as well, I have adapted the experimental design of Remijsen and van Heuven (1999) to German stimuli. From the record-ings of a male native German speaker a verb-first phrase was extracted as the test stimulus. This stimulus was stepwise manipulated in its boundary tone height, resulting in a stimulus continuum modeling a continuous change from a low to a high boundary tone. This boundary tone continuum was presented to listeners in a Categorical Perception test design. The results show clear evidence for the Categorical Perception of the two underlying boundary tones, and therefore, they support the category status of these boundary tones in German. However, the results are slightly different from those in the segmental domain (Liberman et al., 1957). Instead of a correlation between the general category boundary and the

(19)

xi discrimination maximum for the segmental stimuli, i.e. a correlation when aver-aging over all participants, the prosodic stimuli reveal a correlation between the individual category boundaries and the individual discrimination maxima. Thus, prosodic categories seem to be more variable than segmental ones. However, it has to be stated that in German a verb-first phrase might have produced a bias towards question interpretation. Although such a bias did not occur in the data it cannot be completely excluded.

With regard to this possible bias the CP experiment was repeated using a German prepositional phrase (PP). In contrast to a German sentence, a PP does not include any verb and therefore any syntactic hint about the sentence mode is missing. Furthermore, in German a PP can be used as either statement or question, but its sentence mode can only be determined via the boundary tone height of the PP. A new stimulus continuum was created using the PP and ma-nipulating its boundary tone height in equal steps from L% to H%. The stimuli were presented to the participants in the Categorical Perception test design and additionally in the design for a Perceptual Magnet Effect test. The PME de-sign was added because it was observed that in German most questions can be identified via their syntactical features. Thus, it is not necessary to produce a high boundary tone in order to produce a clearly identifiable question, which, in terms of exemplar theory results in less instances stored in the H% category, and this might influence the existence of a PME. The experimental results again clearly support the existence of the two hypothesized boundary tone categories in German. However, only for the low boundary tone a Perceptual Magnet Effect is confirmed, which seems to support the hypothesis that the category of the high boundary tone is less stable because of a fewer frequency of occurrence in speech resulting in less stored exemplars and no Perceptual Magnet Effect because of the missing or only sparsely developed category center. First analyses of speech cor-pora document the indeed lower frequency of occurrence of the H% in German. On the other hand, it is possible as well that the high boundary tone is ambiguous between the interpretations of question and continuation. This ambiguity might result in a better discrimination performance inside the H% category and thereby in a smaller or no warping effect of the perceptual space around the prototype of the H% category.

To exclude the continuation interpretation context information was provided immediately preceding each stimulus. The results of these context stimuli reveal a Perceptual Magnet Effect in the low and in the high boundary tone category. However, the total discrimination performance was relatively low, maybe because of the longer stimuli necessary for the additional presentation of the context information. Thus the Categorical Perception of the low versus the high boundary tone in German was less clear than for the out-of-the-blue stimuli. The individual differences in the discrimination performance may result either from differences

(20)

in the performances of the individual short-term memory or from differences in the individual response criterion.

In various studies (Pisoni and Tash, 1974; Batliner and Schiefer, 1987; Chen, 2003) reaction times were found to be a reliable indicator for the simplicity of a perceptual decision. Thus, in the experiment presented in this thesis reaction times were measured for each individual decision. The results support the al-ready known correlation, i.e. the more simple a perceptual decision is the lower the reaction time will be. Including reaction times in the analyses of perceptual results can help to determine which subjects did have problems with which stim-uli, thereby confirming the reliability of the perceptual results in general. In the experiments discussed in this thesis the reaction times reflect the results of the Categorical Perception and of the Perceptual Magnet Effect : Inside a boundary tone category, the identification of a stimulus works well and is carried out quickly while the stimulus identification at or near the category boundary between L% and H% is difficult and takes much more time. The discrimination at the cate-gory boundary is good and the decision is made quickly. Inside any of the two boundary tone categories during discrimination shorter reaction times around the category prototype compared to the non-prototype support the existence of the Perceptual Magnet Effect. Interestingly, in all experimental parts female subjects were clearly faster in their decisions than the male ones, although the females’ discrimination performance was worse compared to the males’ discrimination re-sults. These results do not depend on whether the stimuli were spoken by a male or a female voice. However, no reason for these gender-specific differences could be found in the data.

To summarize, the results discussed in this thesis support the existence of prosodic categories in general, and especially those of the high and the low boundary tone in German. These two prosodic categories are used to differ-entiate between the sentence modes statement versus question, but only in case of syntactically ambiguous phrases. Furthermore, the results support the use on Exemplar Theory for speech data. The category of the low boundary tone seems to contain much more exemplars than the category of the high boundary tone as the latter category is less often produced and thus less often perceived than the first one. This results in a clear Perceptual Magnet Effect for the L% category as there enough exemplar are stored to support the development of a category cen-ter, and only in the center of a category the PME can occur. For most listeners the H% category contains only a few exemplars which in turn inhibits the devel-opment of a Perceptual Magnet Effect there. The logged reaction times support the perceptual findings and reveal the hypothesis that reaction times correlate with the simplicity of a perceptual decision.

(21)

Deutsche Zusammenfassung

Diese Arbeit untersucht experimentell die Wahrnehmung von prosodischen Ka-tegorien im Deutschen, und zwar am Beispiel der in der deutschen Phonolo-gie postulierten Grenzt¨one L% und H%. Diese Grenztonkategorien wurden aus-gew¨ahlt, weil die Unterscheidung zwischen ihnen als der unumstrittenste tonale Kontrast gilt, der in vielen Sprachen, auch im Deutschen, einem Wahrnehmungs-konstrast im Bereich des Satzmodus, und zwar dem zwischen Aussage und Frage entspricht. Die Basis der in dieser Arbeit dargestellten Experimente ist die An-nahme, dass die unterschiedliche Wahrnehmung von L% und H% als Aussage bzw. Frage auf die Existenz zweier prosodischer Kategorien zur¨uckzuf¨uhren ist, denn bei einer syntaktisch ambigen ¨Außerung erfolgt die Identifikation des Satzmodus ¨

uber die H¨ohe des Phrasenendtones.

Prosodie hat sowohl linguistische als auch paralinguistische Funktionen. Als die akustischen Hauptkorrelate der Prosodie gelten Grundfrequenz, Intensit¨at und Dauer (Clark, Yallop, and Fletcher, 2007). Da jedoch prosodische Ereignisse in der nat¨urlichen Sprache immer zeitgleich mit segmentalen Informationen auf-treten, gestaltet sich die Untersuchung der rein linguistischen Funktionen der Prosodie schwierig. Des Weiteren k¨onnen die akustischen Korrelate der Prosodie auch der Vermittlung paralinguistischer Informationen dienen. Trotzdem konnte die Existenz prosodischer Kategorien untermauert werden (Bruce, 1977; Clements and Ford, 1979), und prosodische Kategorien werden h¨aufig postuliert, um tonale Ereignisse in ¨Außerungen linguistisch zu motivieren und zu erkl¨aren (Pierrehum-bert, 1980).

Unter Annahme der Existenz dieser beiden prosodischen Kategorien wer-den in dieser Arbeit zwei experimentelle Designs zum Nachweis von Kategorien bzw. zum Nachweis von Wahrnehmungsunterschieden in und zwischen Kategorien dargestellt und angewandt: der Test auf Kategoriale Wahrnehmung (CP) (Repp, 1984) und der Nachweis eines Perzeptuellen Magneteffekts (PME) (Kuhl, 1991). Beide Designs wurden urspr¨uglich f¨ur die Untersuchung von Wahrnehmungskon-trasten im segmentellen Bereich, d.h. speziell zur Validierung von Phonemkate-gorien, benutzt. Von Kategorialer Wahrnehmung spricht man dann, wenn die Grenze zwischen 2 Kategorien mit der besten Unterscheidungsleistung zweier Stimuli im Stimulus-Kontinuum korreliert. Kann die Kategoriale Wahrnehmung

(22)

zweier sprachlicher Ereignisse nachgewiesen werden, dann ist damit auch gleich-zeitig die Existenz dieser Ereignisse als Kategorien der jeweiligen Sprache be-wiesen. Der CP -Test wurde bereits erfolgreich f¨ur den Nachweis prosodischer Kategorien adaptiert (vgl. z.B. Kohler (1987)). Der Nachweis eines Perzeptuellen Magneteffekts wurde bislang nur im segmentellen Bereich durchgef¨uhrt. Beim Perzeptuellen Magneteffekt besteht eine Wahrnehmungsverzerrung die ausschließ-lich um den Prototypen der jeweiligen Kategorie herum auftritt (Kuhl, 1991; Kuhl and Iverson, 1995; Iverson and Kuhl, 1995). Diese Wahrnehmungsverzerrung ¨

außert sich in einer deutlich geringeren Unterscheidungsleistung, d.h. der Proto-typ l¨asst sich nur sehr schwer oder gar nicht von seinen ihn umgebenden Stimuli diskriminieren. Um einen Nicht-Prototypen ist eine derart schwache Diskrimi-nationsleistung nicht zu beobachten, trotz akustisch vergleichbarer Abst¨ande der umgebenden Stimuli zum Prototypen bzw. zum Nicht-Prototypen.

In die Analyse und die Interpretation der experimentellen Ergebnisse wer-den die Annahmen der Signal Detection Theorie (SDT) und der Exemplartheorie mit einbezogen. Die Signal Detection Theorie (Wickens, 2002) postuliert, dass Probanden trotz gleicher auditiver F¨ahigkeiten einen individuellen Response-Bias haben, der dazu f¨uhren kann, dass sich die Perzeptionsergebnissen der Proban-den deutlich voneinander unterscheiProban-den. Die Exemplartheorie (vgl. z.B. Johnson (1997), Pierrehumbert (2001b; 2001a), Goldinger (1996)) ist, im Hinblick auf Sprachproduktion und Sprachperzeption, eine der erfolgreichsten Theorien des letzten Jahrzehnts. Sie postuliert, dass Sprachbenutzer ihre Wahrnehmungen mit allen Details in Exemplarwolken im perzeptuellen Raum speichern. Bei der Sprachproduktion wird dann auf diese Wolken von ¨ahnlichen Exemplaren wieder zur¨uckgegriffen, d.h. Sprachproduktion und Sprachperzeption sind untrennbar miteinander verbunden. Je mehr Exemplare gespeichert sind, desto stabiler ist eine sprachliche Kategorie. Erst dann kann sie ein Kategoriezentrum und m¨oglicher Weise einen Perzeptuellen Magneteffekt entwickeln (Lacerda, 1995).

Grundlage der in dieser Arbeit vorgestellten Experimente ist der Nachweis der Kategorialen Wahrnehmung des tiefen und des hohen Grenztons im Niederl¨ andi-schen (Remijsen and van Heuven, 1999), die den Satzmodi Aussage bzw. Frage entsprechen. Da dieser tonale Kontrast auch im Deutschen existiert, habe ich das Design von Remijsen und van Heuven (1999) f¨ur den Nachweis der Kate-gorialen Wahrnehmung dieser Grenzt¨one im Deutschen ¨ubernommen. Die aus den Aufnahmen eines m¨annlichen deutschen Muttersprachlers extrahierte Verb-Erst-Phrase wurde stufenweise in ihrer Grenztonh¨ohe manipuliert. Dabei ent-stand ein Stimulus-Kontinuum, das einen kontinuierlichen ¨Ubergang von einem tiefen zu einem hohen Grenzton bildet. Dieses Kontinuum wurde den Probanden entsprechend dem Test auf Kategoriale Wahrnehmung dargeboten. Die Ergeb-nisse zeigen klare Hinweise auf die Kategoriale Wahrnehmung der beiden unter-suchten Grenzt¨one im Deutschen und weisen damit deren Kategorie-Status nach.

(23)

xv Allerdings unterscheiden sich die Ergebnisse leicht von denen der Kategoriale Wahrnehmung im segmentellen Bereich (Liberman et al., 1957). F¨ur die prosodi-schen Stimuli wurde anstatt einer Korrelation zwiprosodi-schen dem allgemeinen, d.h. dem ¨uber alle Probanden gemittelten, Kategorie¨ubergang der Identifikation und dem allgemeinen Maximum in der Diskriminationskurve eine Korrelation zwi-schen den individuellen Kategorie¨uberg¨angen und den individuellen Diskrimina-tionsmaxima gefunden. Dies spricht f¨ur eine gr¨oßere Variabilit¨at der prosodischen im Vergleich zu den segmentellen Kategorien. Allerdings muss hier in Betracht gezogen werden, dass ein Verb-Erst-Satz im Deutschen einen Bias in Richtung Frageinterpretation bewirkt haben kann, der sich zwar nicht in den Ergebnissen zeigt, aber dennoch nicht ganz ausgeschlossen werden kann.

Aufgrund dieses m¨oglichen Bias’ wurde das Experiment mit einer Pr¨ apositio-nalphrase (PP), wiederholt. Eine PP enth¨alt im Gegensatz zu einem vollst¨ andi-gen Satz kein Verb, welches im Deutschen R¨uckschl¨usse auf den syntaktisch in-tendierten Satzmodus zul¨asst. Im Deutschen kann eine PP sowohl als Aussage als auch als Frage fungieren. Der Satzmodus dieser Phrase kann dann jedoch nur ¨uber die Grenztonh¨ohe verifiziert werden. Dieses neue Stimulus-Kontinuum wurde sowohl auf Kategoriale Wahrnehmung als auch auf einen Perzeptuellen Magneteffekt der Grenzt¨one getestet. Grundlage f¨ur den PME Test war die Beobachtung, dass im Deutschen Fragen vorwiegend anhand ihrer syntaktischen Merkmale identifiziert werden, was dazu f¨uhrt, dass viele Fragen mit einem tiefen Grenzton produziert werden k¨onnen, ohne dass Missverst¨andnisse bez¨uglich des Satzmodus’ auftreten. Entsprechend der Exemplar Theorie m¨usste dies zu deut-lich weniger gespeicherten Elementen in der H% Kategorie f¨uhren, was einen Ein-fluss auf die Existenz des PME in dieser Kategorie haben kann. Die Ergebnisse dieses Experiments unterst¨utzen klar die Existenz der beiden Grenztonkategorien im Deutschen, wobei jedoch nur f¨ur den tiefen Grenzton ein Perzeptueller Mag-net Effekt auftritt. Damit st¨utzen die Resultate die Hypothese, dass die hohe Grenztonkategorie weniger stabil ist, weil sie weniger h¨aufig benutzt wird, somit weniger Exemplare als die L% Kategorie enth¨alt und aufgrund des fehlenden oder nur minimal vorhandenen Kategoriezentrums keinen Perzeptuellen Magneteffekt entwickeln kann. Erste Korpusanalysen belegen die seltenere Verwendung des hohen Grenztones im Deutschen. Andererseits ist es auch m¨oglich, dass der hohe Grenzton ohne Kontext zwischen der Frage- und der Continuation-Interpretation ambig ist, wodurch die Diskriminationsleistung innerhalb der hohen Grenztonka-tegorie verbessert und damit die klare Verzerrung des perzeptuellen Raumes um den Frage-Prototypen herum verringter wird.

Um diese Interpretation als Continuation auszuschließen wurde in einem wei-teren Experiment dem Stimulus ein Kontextsatz vorangestellt. Tats¨achlich zeigen die Ergebnisse nun einen Perzeptuellen Magnet Effekt sowohl in der tiefen als auch in der hohen Grenztonkategorie. Allerdings bewirkt die deutlich gr¨oßere L¨ange

(24)

der Kontext-Stimuli eine insgesamt niedrige Diskriminationsleistung. Somit ist der Nachweis der beiden Grenztonkategorien L% und H% in den Kontext-Stimuli schw¨acher als in den Stimuli ohne Kontext. Die individuellen Unterschiede in den Diskriminationsleistungen der Probanden k¨onnen auf Unterschieden in der Leistung des Kurzzeitged¨achtnisses beruhen oder aber durch verschiedene indi-viduelle Response-Kriterien erkl¨art werden.

Da sich in verschiedenen Studien (Pisoni and Tash, 1974; Batliner and Schiefer, 1987; Chen, 2003) Reaktionszeiten als guter Indikator f¨ur die Einfachheit einer perzeptuellen Entscheidung erwiesen, wurden die Reaktionszeiten auch in den hier vorgestellten Experimenten f¨ur jede individuelle Entscheidung gemessen. Die Ergebnisse best¨atigen den bereits beobachteten Zusammenhang, d.h. je einfacher eine perzeptuelle Entscheidung ist, desto schneller wird sie getroffen. Die Ein-beziehung der Reaktionszeiten in die Auswertung der Ergebnisse von Perzeptions-experimenten kann dementsprechend Hinweise darauf liefern, f¨ur welche Proban-den welche Stimuli problematisch waren, und somit die Zuverl¨assigkeit der Ergeb-nisse validieren. Die hier gemessenen Reaktionszeiten unterst¨utzen auch die Ergebnisse der Kategorialen Wahrnehmung und die des Perzeptuellen Magnet Ef-fekts: Innerhalb einer Grenztonkategorie ist die Identifikation eines Stimulus gut und schnell, wogegen die Identifikation eines Stimulus an der Kategoriengrenze zwischen L% und H% schwierig ist und sehr viel mehr Zeit in Anspruch nimmt. Die Diskrimination an der Kategoriengrenze ist gut und die daf¨ur ben¨otigte Reak-tionszeit kurz. Innerhalb einer Kategorie unterst¨utzen k¨urzere Reaktionszeiten bei der Diskrimination um den Prototypen im Vergleich zum Nicht-Prototypen die Existenz eines Perzeptuellen Magnet Effekts. Interessanter Weise sind in allen Teilexperimenten die weiblichen Probanden in ihren Entscheidungen deut-lich schneller als die M¨anner, weisen aber schlechtere Diskriminationsleistungen als die m¨annlichen Probanden auf. Diese Ergebnisse treten unabh¨angig davon auf, ob die Teststimuli von einer weiblichen oder einer m¨annliche Stimme pro-duziert werden. Uber die Ursache daf¨¨ ur kann in dieser Arbeit nur spekuliert werden, da in den Experimenten selbst keine Hinweise zur Interpretation dieser Ergebnisse gefunden werden konnten.

Zusammenfassend l¨asst sich sagen, dass die Ergebnisse der in dieser Arbeit diskutierten Experimente allgemein die Existenz prosodischer Kategorien un-terst¨utzen und die der Grenztonkategorien L% und H% im Deutschen nach-weisen. Diese beiden Grenztonkategorien dienen der Unterscheidung der Satz-modi Aussage und Frage, auch wenn diese Unterscheidung ¨uber die Prosodie nur dann gemacht wird, wenn keine syntaktischen Hinweise in Bezug auf den Satzmodus vorhanden sind. Weiterhin unterst¨utzen alle Ergebnisse die Ideen der Exemplartheorie. In der Kategorie des tiefen Grenztones m¨ussten deutlich mehr Exemplare vorhanden sein als in der Kategorie des hohen Grenztones, weil die zweite Kategorie seltener produziert und damit auch seltener wahrgenommen

(25)

xvii wird. Dies hat zur Folge, dass sich ein klarer Perzeptueller Magnet Effekt f¨ur den tiefen Grenzton entwickeln kann, da gen¨ugend Exemplare vorhanden sind. Dadurch kann sich ein Kategorie-Zentrum herausbilden, und nur dort kann ein PME entstehen. In der hohen Grenztonkategorie hingegen sind bei vielen H¨orern nur wenige Exemplare gespeichert, was die Entstehung eines Perzeptuellen Mag-net Effekts behindert. Die untersuchten Reaktionszeiten unterst¨utzen die Perzep-tionsergebnisse und best¨atigen die Hypothese, dass Reaktionszeiten mit der Ein-fachheit einer perzeptuellen Entscheidung korrelieren.

(26)
(27)

Chapter 1

Introduction

Classifying things or events into categories, i.e. structuring the world surrounding us, is a general human property to handle the variability of things or events. Such classifications can be found in various areas of human perception, e.g. in colour perception or when identifying different breeds of dogs or cats, or when describing the specific features of a table or a chair, and so on. As humans order things into categories, it was hypothesized at the beginning of the 1930ies that a similar ordering mechanism might as well underlie speech perception and production. Trubetzkoy (1939) proposed that two things, i.e. two phonemes, can only be distinguished if they stand in an opposition to each other in at least one feature. This idea was the basis for the model of the distinctive features (Jacobson, Fant, and Halle, 1952), which underlined the existence of phonological and therefore linguistic categories in the mind of the listener. However, these theoretical constructs need experimental verification. The phonetic grounding of these hypothesized linguistic categories, i.e. of phonemes, certain features such as voiced versus voiceless sounds, and phonological rules or constraints, has received increasing attention during the last decades (e.g. Libermann et al. (1957), Repp (1984), Pierrehumbert (2000), Jessen (2000), Kingston (2003)). Furthermore, the phonetic grounding of phonological assumptions has included aspects such as perception, memory, and motor control in addition to the classical areas of articulation and acoustics; as Pierrehumbert (2000, p. 8) states, “[...] there is no substance-free part of phonology.” However, phonetic grounding was mostly restricted to segmental phenomena such as phoneme verification or verification of distinctive features. This thesis investigates the idea of the phonetic grounding of possible prosodic categories from the perspective of speech perception by using different experimental designs.

Prosody is a subdiscipline of linguistics. The term prosody is often used in-terchangeably with the terms suprasegmental or intonation. Contrary to features that are valid for single segments in speech, e.g. for phonemes, the term prosody describes events that span more than one single segment. Therefore, prosodic

(28)

events are attached to higher linguistic units such as syllables, words, phrases, utterances, or whole discourses. Prosody comprises linguistic as well as par-alinguistic functions. Linguistic functions are associated with lingustic aspects such as syntax, semantics, the lexicon, morphology, phonetics, and phonology. Prosody is, for example, involved in tasks such as word segmentation from the continuous speech input, syntactic phrasing, word or sentence stress setting, and accentuation (e.g. Pierrehumbert (1980), Silverman et al. (1992), Dogil (2003), Hirst (2004)). Paralinguistic functions, on the other hand, comprise e.g. the iden-tity of the speaker, his or her attitude, mood, age and several other features and aspects of the specific speech act performed, which do not influence the linguis-tic information of the utterance itself. Prosody can be attached to three main acoustic correlates:

1. intonation or pitch, i.e. the tonal domain which comprises the linguistically relevant functions of the fundamental frequency (F0) at the syllable, the word, the utterance, and the discourse level,

2. duration, i.e. the temporal domain which comprises the linguistically rele-vant functions of absolute and of relative duration of certain units (phon-emes, syllables, etc.), and

3. intensity or stress, which comprises the linguistically relevant functions of energy features.

However, each of these acoustic parameters is known to support not only linguistic but also paralinguistic functions in spoken language, i.e. it is therefore necessary to distinguish carefully between these two different functions of each parameter. Prosody has an integrating function in the organization and in the production of speech, as prosody represents the frame which includes all possible information spoken language can convey. This may affect the syntax, i.e. the phrasing of an ut-terance, its morphological structure, its segmental structure, or even its semantic content or intonational meaning. The information is packed into syllables, met-rical feet, phonological words or intonational phrases (Levelt, 1989; Dogil, 2003). Prosody helps to integrate phonetic features into phonological representations in the memory as prosody can be used to categorize the continuously varying pho-netic properties produced by the speaker. The prosody of an utterance may even change the syntactic structure or the semantic interpretation of this utterance, as Hirst (2004, p. 163) formulates “[...] when there is a discrepancy between the prosody of an utterance and its overt semantic content we usually trust the prosody rather than the semantics.”

The main aim of this thesis is to phonetically ground hypothesized prosodic categories. One aspect of this phonetic grounding addresses the question whether factors that support the development of certain categorizations over others exist.

(29)

3 It is known that the utilization of the phonetic space in the hearers’ mind is not uniform, an effect that has been studied in great detail with respect to the vowel space (Stevens, 1989). Stevens (1989) has argued that the relationships between articulatory parameters and their acoustic and auditory responses are nonlinear. Therefore, an important question is whether such nonlinearities ex-ist in the prosodic domain as well. A second aspect of the phonetic grounding concerns the mapping of prosodic categories onto continuous acoustic parame-ters of the speech signal. Following Pierrehumbert (2003), it is expected that probability distributions based on frequencies of occurrence play an important role in a model of phonetic grounding in the prosodic domain. Exemplar Theory (Lacerda, 1995; Goldinger, 1997; Johnson, 1997; Pierrehumbert, 2001a; Pierre-humbert, 2003) assumes that all perceived speech events are stored in the lis-tener’s memory as “exemplars”. The higher the number of exemplars stored in a specific category the more stable this category will be, which also influences the production of the respective category as production uses the stored exemplars in the listener’s memory as the basis for the element that has to be produced.

A further aspect of this thesis is to show that the classical paradigm of Cate-gorical Perception (Repp, 1984) and the concept of the Perceptual Magnet Effect (Kuhl, 1991) can be adapted to verify the existence of prosodic categories as well. Perception experiments provide a good opportunity to test whether listeners are able to perceive a category or specific phonetic or phonological features, and to distinguish a specific category or features from other categories of features. Other experimental designs that may be suitable to test for prosodic categories are dis-cussed as well. Additionally, Pisoni and Tash (1974) disdis-cussed the possibility that reaction times might provide hints for the difficulty of a perceptual decision, i.e. the longer the reaction time (RT), the more difficult was the perceptual decision, which was taken into account by other researchers (Massaro, 1998; Chen, 2005) as well. As reaction times seem to be relevant for the interpretation of perceptual results, they will also be measured and analysed during the experiments presented in this thesis.

To test for the existence of prosodic categories I decided to investigate the least disputed prosodic contrast which can be described in terms of different categories: the perception of boundary tones of intonational phrases. Although F´ery (1993) proposed that there is only one boundary tone which will be specified according to the preceding pitch accent,1 Batliner (1989) and Grice and Baumann

(2002) propose that there are two boundary tones, a high versus a low one, which are independent2 of the preceding pitch accent. The low boundary tone (L%)

1The boundary tone is specified as high when the preceding pitch accent ends on a high

tone, and low otherwise.

2They are not totally independent of the preceding pitch accents as there are rules that

(30)

corresponds to the statement interpretation and the high boundary tone (H%) to the question interpretation of the presented phrase. In this thesis, the categorical status of these two hypothesized boundary tones as well as possible Perceptual Magnets inside the categories are tested in 5 experiments, which differ in the stimulus, the use of context conditions and the speaker’s voice. The results of the subtests of each experiment are then combined to create a perceptual reference map representing the locations of the category exemplars within the perceptual space of the average listener which further supports the existence of these two prosodic categories postulated by German phonology.

To summarize the introduction, this thesis will show that the test designs of Categorical Perception and Perceptual Magnet Effect are adaptable to the field of prosody. Furthermore, the two hypothesized German boundary tones will be confirmed as prosodic categories which have an internal structure and can there-fore develop Perceptual Magnets. However, CP does not necessarily stipulate PME, i.e. categories do not need to contain a Perceptual Magnet to be marked as categories. Exemplar Theory seems to be useful to explain the experimen-tal results as well as to account for a perceptually divided memory space in the listener’s mind, i.e. a so-called Perceptual Reference Map. The hypothesis that phrases presented in isolation3 might differ in their perception with respect to sentence mode is determined as well. Furthermore, for the interpretation of the results of the perception experiments reaction time measurements will be taken into account.

The thesis is organized as follows. In Chapter 2 the reader will be introduced to the theoretical background. There the categorization process and the percep-tion of categories will be discussed with respect to speech with a special focus on prosodic categories. Moreover, intonation models and their relation to the perception of prosodic categories will be presented. Furthermore, the perception of prosodic categories will be set into the frame provided by Exemplar Theory. In Chapter 3 the experimental designs adopted in the perception experiments of this thesis will be presented in detail. Additionally, the advantages and disadvantages of the Categorical Perception and Perceptual Magnet Effect tests will be discussed with respect to other possible experimental tasks. The preparation of the stimuli used in the experiments as well as the experimental subtests will be described in this chapter. In Chapter 4 an experiment looking for Categorical Perception and a second one looking for possible Perceptual Magnet Effects in the categories of the low and the high boundary tone in German will be presented. For both designs an out-of-the-blue phrase was used spoken by a native male professional German speaker. Part of these two experiments have been published in Schneider

(31)

5 and Lintfert (2003) and in Schneider and M¨obius (2005), and a summary of both experiments was published in Schneider et al. (2006). An anonymous reviewer of the manuscript of Schneider et al. (2006) proposed to repeat the experiments us-ing better controlled stimuli, i.e. stimuli consistus-ing of mostly sonorants which can be better manipulated as they have a continuous fundamental frequency contour. This hint was adopted in the experiment which will be discussed in Chapter 4, Section 4.3, where a controlled out-of-the-blue phrase was presented as stimulus. The reaction time results of the experiments discussed in Section 4.3 have been published in Schneider et al. (2011). As the results of the experiments discussed in Chapter 4 revealed that context might play an important role in the percep-tion of the boundary tone of a phrase, context informapercep-tion was provided to the listeners. In the experiment which will be presented in Chapter 5 the stimulus is embedded in three different context utterances. Part of the results of these con-text experiments have been published in Schneider et al. (2009). Furthermore, as gender specific differences were found in the results of the experiments presented in Chapter 4, Chapter 6 will test for the influence of the gender of the speaker on the perception of the boundary tone of a phrase by presenting the stimuli spoken by a female speaker. Finally, the results of this thesis will be discussed in Chapter 7.

(32)
(33)

Chapter 2

Theoretical Background

2.1

Categorization

According to Repp (1984) categorization occurs when humans focus on one or more important properties that are common to certain objects or events and ig-nore irrelevant details. Categories have often got names in order to differentiate between several existing categories and to identify them. This name is a hyper-nym, i.e. a word or phrase that covers the main semantic meaning of all the items that fall into this category. E.g. the category “red” includes all different shades of the colour “red” such as “scarlet”, “vermilion”, or “crimson”, as long as these colours share the main features which are ascribed to the colour “red”.

We as humans do not experience the world around us in terms of single objects or percepts which exist independently from each other, but in terms of categories (Harnad, 2003). Categories are the results of a categorization process, i.e. of a process which is used to order the entities of the world in some way, i.e with respect to their main features or characteristics. This categorization process en-ables and simplifies the learning and helps to understand existing correlations between different objects, events or whole categories. The categorization process affects all areas where objects or events have to be identified. Many of the cate-gories we use are natural ones, i.e. they describe physical distributions between objects or events in the world, and the assignment of an object or an event to a specific category is not questioned because it is obvious. Elements of a specific category share some1 of the major properties ascribed to this category, but, of

course, they may differ in minor properties, which are (more or less) irrelevant for the identification of the category membership.

However, sometimes the assignment of a member to a specific category is hardly transparent, and may reflect special knowledge or conventions. For

exam-1They do not necessarily need to share all of the main properties of the category as this

would complicate the categorization process, i.e. some slight variation is possible.

(34)

ple, zoologists classify seahorses and eels to be members of the category fish but not dolphins or whales, whereas many people would classify dolphins and whales as members of the category fish but not necessarily seahorses or eels. This implies that categorization itself ist not always unambiguous and may also be guided by intuition. In general, which categories exist in the mind of a listener or speaker is influenced by society, social requirements and many other factors.

How categories emerge is a widely disputed and very actual area of research. Categories2 can be learned. This learning process needs time and works on a trial and error basis (Harnad, 2005). We have to learn which properties of cer-tain elements are the major ones and can therefore be used for classification, or which properties are irrelevant for classification. However, the most prominent properties are not always used for the classification or categorization3 of objects

or events. Furthermore, one object or event can be classified in different ways, i.e. assigned to different categories, depending on special circumstances or needs. Additionally, categorization does not always need to be learned step by step by the try and error method. Instead, sometimes the assignment of elements to a specific category can be copied from other members of the community. For in-stance, there is no need to calculate each prime number on your own to learn them, because you can rely on the calculations of mathematicians and learn e.g. the lowest 5 to 10 prime numbers by heart.

Categorization turns out to be fundamental e.g. in language or language learn-ing, in (mathematical) prediction and inference, and in decision making. There are many categorization theories and techniques, e.g. the classical Aristotelian view which claims that categories are discrete entities characterized by a set of properties which are shared by their members, or the conceptual clustering method which generates a concept description for each class and in most cases also hier-archical category structures. The latter method is used in machine learning (e.g. Hanson and Bauer (1989)). However, there are categorization processes that were adapted to the classification of speech objects or events such as Categorical Per-ception (CP), Continuous PerPer-ception, or the Perceptual Magnet Effect (PME) which will be discussed in detail in Section 2.1.3.

2.1.1

Speech categories

In linguistic theory, speech categories play an important role. They in part arise from the language learner’s experience, and they are governed by functional prin-ciples, which have to be instantiated in the mind and the grammar of the speaker (Kingston, 2003). E.g. the phonemes comprise the phonological inventory which characterizes a specific language. It is supposed that speech categories behave

2and this includes speech categories as well

(35)

2.1. CATEGORIZATION 9 similar to other, non-speech categories, and that flexible as well as inflexible speech categories might exist therefore (Kingston, 2003). Flexible categories are learned as sets of items that share one or more major properties. The distribution of a phonetic contrast that is assigned to form a category must differ sufficiently between two phonemes to enable listeners to find this difference and to attach the correct category label to the given phonemes. If the phonetic property values of two phonemes do not allow the listener to differenciate between them, i.e. the difference between the two phonemes is too small, there will be no basis for as-signing the two tokens to two different categories, i.e. they will be treated as two instances of the same category. The assignment of different phonemes to differ-ent categories has to be learned by each member of the speech community. This implies that the linguistic experience of a listener plays an important role in the categorization process which was already mentioned by Kingston (2003, p. 286): “What counts as distinct categories depends quite directly on the specific distri-butions of property values that listeners experience, so those categories should readily shift when they experience different distributions of values.” This ex-plains why more or less identical instances of the same phoneme can be perceived differently by listeners of different language backgrounds. Listeners compare the incoming speech instance with the members of the speech categories present in their memory. As they do not share the same mother tongue, they had access to different phonemes, and their phoneme categories might differ with respect to their main properties selected for categorization. Therefore, these listeners might locate the same realization of a phoneme differently in their perceptual space. Therefore, speech categories are said to be language specific.

On the other hand, inflexible categories may, after some time, become quite stable. Kingston (2003) assumes that increasing experience in their first language makes learners increasingly insensitive to the distributions of phonetic properties in contrasting sounds of a second language they want to learn. For example, Best et al. (2001) expected monolingual listeners to have much more inflexible categories, and to perceive foreign categories in terms of assimilating them to categories of their native language. Otherwise, if the listeners remain sensitive to the distributions of phonetic properties in the contrasting sounds of a second language, then, Kingston predicts, they will not be capable of transfering this sensitivity to reproduce the properties of these sounds. “Phonological specifica-tions often change because listeners misinterpret the continuously variable, pho-netic properties of speakers’ utterances categorically.” (Kingston, 2003, p. 285). Therefore, it is essential to find out how listeners categorize the continuously varying properties produced by speakers and which cues they use for this process (see Section 2.1.3 ).

Many experimental results show, that categories listeners learn contain spe-cific phonetic details that they experienced in hearing instances of these

(36)

cate-gories. E.g. Goldinger (1997) found that listeners retain specific details of the pronunciation of words they listened to in experimental tasks. Yet other results show, just as robustly, that listeners lose the ability to attend to these details. Kuhl and Iverson (1995) confirmed that linguistic experience influences speech perception. While young infants previous to the first word stage can distinguish between phonemes not present in the language they acquire, they lose this ability whith increasing linguistic competence. Taking these results into account, stable categories might develop Perceptual Magnet Effects (PME) which warp the per-ceptual space towards a category prototype thereby reducing the discrimination sensitivity at this point (cf. Section 2.1.3 for a detailed description of PME ). Only listeners who do not have developed a Perceptual Magnet for a specific category continue to perceive slight differences between the instances of this category.

2.1.2

Prosodic Categories

Linguistic categories emerge when the expressions which code them form so-called natural classes. According to traditional linguistics, these natural classes are determined by rules. Whenever linguistic expressions are treated as the same by a rule, they form a natural class, and they automatically get a categorical status. This classical account of phonological categorization has been extended to prosodic categories. Early work in autosegmental phonology has discovered several tune-to-text association rules of a clear categorical status. In his study of Swedish accents, Bruce (1977) established the categorical status of prosodic events that code accent and focus in Swedish. However, this categorical status is strong due to the text side of the tune-to-text association, i.e. its source lies in the linguistic structure it is associated with, i.e. in the information structure. Therefore, the properties of the tunes are predetermined by the properties of the text. Similarly, Clements and Ford (1979) demonstrated the categorical status of tonal events in Kikuyu. Each morpheme contributes a tone to the tone melody (the tune) of a word, but general association rules generate tunes where mor-phemes are not necessarily associated with their underlying tones (Goldsmith, 1990, pp. 11–14).

In her intonation model, Pierrehumbert (1980) also postulated tonal categories to explain intonational events in the utterances linguistically (for a detailed de-scription see Section 2.3.1). Prosody is known to comprise linguistic as well as paralinguistic functions, and several prosodic parameters show values that have to be interpreted in a categorical manner in order to be understood correctly. An example of such a linguistic function is the intonational difference between a question and a statement. Of course, there are further parameters such as the word order in the utterance or the surrounding context which can be used to distinguish between a statement versus a question. However, when these hints

(37)

2.1. CATEGORIZATION 11 are missing, the only features the listener can rely on are prosodic ones. There-fore, as the dynamic pattern of pitch (fundamental frequency (F0)), duration,

and intensity are traditionally defined as the main prosodic correlates (cf. e.g. Clark, Yallop and Fletcher (2007)), these acoustic features should be sufficient4 to identify a syntactically ambiguous utterance as either statement or question.

So, there is evidence for categorical systems in prosody, but it nearly al-ways involves the properties of tune-text association. The categorical status of tunes alone is much more controversial. However, for the last decade it has been claimed that prosodic categories are quite different from all other linguistic cat-egories. Whereas linguistic categories are compositional, and their meanings can be discerned only by logical analytical reasoning and decomposition, the prosodic categories are holistic, gestalt-like, and have immediate iconic reference. House (1990), however, showed that tonal movement was better perceived in vowels than in consonants. His results revealed that the exact location of a tonal fall, i.e. a F0 fall, through a CVC5 syllable could be used to predict the perception

of the contour. When the F0 fall occurred through an area that had a

maxi-mum of new spectral information, i.e. through a phoneme boundary, e.g. CV or VC (Figure 2.1), a level tone was perceived instead of a decreasing one (contour tone). When the F0 fall started during the first consonant and proceeded into

the vowel (passing the CV boundary), subjects reported to have perceived a low tone because only the low part of the F0 movement occurred in an area where no

new spectral information had to be processed, i.e. in the stable part of the vowel. A high tone was perceived when the F0 fall started during the last part of the

vowel and proceeded into the following consonant, because only at the start of the fall, i.e. in the stable part of the vowel, spectral information was at a low level. Only when the complete F0 fall went through the vowel it was really perceived

as a fall (Figure 2.2), because during the vowel, no new spectral information had to be processed as no phoneme change took place. From these observations House (1996) proposed a model of optimal tonal perception which assumes that there exists a general relationship between the complexity of a signal and the pitch sensitivity of the listener. The results of these experiments show hints for the existence of the Categorical Perception phenomenon in the prosodic domain. Depending on the exact location of the F0 fall the perception switches between

3 tonal categories proposed by Pierrehumbert (1980), i.e. two level tones and one contour or complex tone. The experimental results of House (1990) support the existence of clear category boundaries between the tested contours. However, as only an ABX test was carried out, no correlation could be determined between the category crossover and a discrimination peak which is necessary for the

con-4Note that Clark, Yallop and Fletcher (2007, p. 330) remark that there might be further

phonetic correlates of prosody in certain languages such as vowel quality or spectral tilt.

(38)

Figure 2.1: Stylized example of changes in the amount of new spectral information in the course of a CVC syllable (House (1996, p. 2048)).

Figure 2.2: According to the model of optimal tonal perception, a falling contour is perceived as low (L), falling (F), or high (H) depending on its ex-act location (House (1996, p. 2049)). firmation of Categorical Perception. Therefore, the experiments of House do not verify the Categorical Perception of the tonal timing, but they show hints for this phenomenon as a clear category crossover was found.

This example gives reasons to assume that the iconic, non-symbolic aspects of prosody interpretation are overrated (Dogil, 2003). The experimental method-ology presented in Section 3 aims to provide evidence that there are elements of the tonal structure which are categorically perceived.

2.1.3

Perceptual mechanisms

The perception of speech categories is an integral part of each theory of speech perception. However, there are different approaches about the involved percep-tual mechanisms. Two main mechanisms are widely accepted, i.e. the perception of speech categories is often said to be either continuous or categorical.6 Theories

of speech perception (and speech production) such as the Motor Theory (MT (Liberman, 1996)) clearly distinguish between these two modes of perception, i.e. a sound or a phonetic feature is either perceived as categorical or as continuous. The mode of perception there depends directly on articulation. When two sound categories are differentiated by discrete articulatory gestures or movements, e.g. as for plosives which can differ in the use of voicing7 or in the place of

arcticula-6In the literature, the term “continuous” is interchangeably used for “gradient” and

“cate-gorical” for “discrete”.

7Voicing can be perceptually manipulated by varying the voice onset time (VOT) from a

(39)

2.1. CATEGORIZATION 13 tion, then the perception of a stimulus continuum spanning these two categories, will be categorical. On the other hand, if continuous articulatory variations be-tween the two sound categories are possible then perception will be continuous, as Fry and colleagues (1962) found for vowels. However, there are theories that allow a mixture of these two perceptual variants as e.g. the dual process model (Fujisaki and Kawashima, 1971). This model assumes that these two different modes of perception can be simultaneously active or directly follow each other, thereby allowing for different grades of continuous or Categorical Perception. Ad-ditionally, during the last two decades the theory of the Percptual Magnet Effect (Kuhl, 1991) has received increasing attention. According to this theory, percep-tion might be guided by an ideal member of a category, a so-called prototype. The discrimination performance around a prototype is much worse that around a member of the same category which is located near the periphery, i.e. around a non-prototype. These different processes that might be involved when categoriz-ing speech events will be discussed in detail in the followcategoriz-ing paragraphs.

Continuous Perception

Perception is defined as continuous when the crossover, i.e. the change from one category to another, proceeds gradually. This implies that there is no clearly identifiable category boundary, and each stimulus can be identified as more or less like the one or the other category. The perception of emotions or of loudness are examples for Continuous Perception. When listeners are presented with a stimulus continuum from quiet to loud sounds, they can discriminate between two adjacent sounds of this stimulus continuum to a very high extent, i.e. it is relatively unproblematic to find out which sound is the louder one. However, identifying a sound as being quiet or loud whithout any comparison to another sound is very difficult because this decision depends on the individual experiences and listening preferences. This implies that a sound can be perceived as loud by one listener but may be perceived as relatively quiet by another listener. An ideal case of Continuous Perception is illustrated in Figure 2.3. Stimulus 1 clearly belongs to one category while stimulus 11 is definitely an instance of the other category. Between stimulus 1 and 11 the assignment of each stimulus to one of the two proposed categories gradually changes from the first to the second category whithout any clearly visible category switch, illustrated by the solid line. The discrimination performance between each two adjacent stimuli is always very high. These features are the main characteristics of Continuous Perception, i.e. the gradual change in the identification performance (assignment

values are language specific. For German it is postulated that a plosive having a VOT below 30 ms will be perceived as voiced while a plosive having a VOT above 40 ms will be perceived as voiceless.

Referenzen

ÄHNLICHE DOKUMENTE

The results given in Stefanowitsch (2009) and Heine (2016) show that neither a balanced distribution of the different passives and neighboring substitute constructions nor a

In the history of political and economic culture of the world, facts have suggested that, while considering the relevant issues, intellectual effect can very well prevent the

In this paper, we have considered the stability problem for the standing boundary of non-even age spatially distributed forest under spatial perturbation of its age

This thesis examines seasonality in Estonian society, with the aim of learning about patterns of seasonal behaviour. This thesis argues that seasonality in Estonian society can

• Palestine, Class 3 (15.4%) very often contrasts the representation (almost without exception) of cooperative behavior and (relatively frequently) its announcement with

2 By analyzing the Wess-Zumino consistency conditions and the anomaly-descent procedure, we show that the existence of a symmetric boundary requires the corresponding Schwinger term

The array spanned the Santos Plateau, the Vema Channel, and the Hunter Channel, all areas believed to be important for transport of Antarctic Bottom Water between the Argentine

Current and future prevalence of the different types of milking parlour and average number of milking units (MU) per milking parlour in farms planning to invest in new