• Keine Ergebnisse gefunden

3 Obstruents in speech production and perception

4.3 Methodology

In the following, the main experiment is described that was planned on the basis of the experiences of the pilot study described in the previous section.

4.3.1 Materials

Six obstruents [pf], [ts], [f], [z] or [s] (depending on the word position, cf. last paragraph of this section), [p] and [t] are investigated in the environment of seven different vowels [¤], [”], [a], [—], [¦], [¥] and [8]. In some cases, the short or lax vowel had to be replaced by the long or tense vowel because no respective short vowel-obstruent combination was found. In the following, the IPA symbols for the short vowels are used, although some words with long vowels occur in the speech data. The German labial obstruents can be divided into labiodentals [pf, f] and bilabials [p], but are referred to as “labials” in the context of this investigation.

Each target sound was recorded in two word positions: initially and word-medially (6 x 7 x 2 = 84). For each of these word positions, three respective words were chosen (84 x 3 = 252), for example, [pf] followed by the vowel [a] in word-initial position was represented by Pfarrer [pfa] ‘priest’, Pfanne [pfan§] ’pan’ and Pfaffe [pfaf§] ‘cleric’. For the same phoneme combination in word-medial position (here, the relevant vowel precedes the consonant), the respective words were Apfel [apf§l] ‘apple’, Stapfen [B#apf§n] ‘footprint’ and Zapfer [tsapf©] ‘tapster’. All test items, apart from Sand [zant] ‘sand’, are bisyllabic. As mentioned above, the neighbouring vowel was intended to be always short, but especially in word-initial position, it was sometimes impossible to find three example words containing the

desired consonant-vowel combination. The word-medial affricates were always preceded by a short vowel. It was not possible to use a CV combination in word-medial position (as it was done for the initial position), because only very few such combinations can be found in German. The test words were again recorded in two different tasks, (252 x 2 = 504). Ten speakers contributed to the data collection, summing up to a total of 5040 items (504 x 10 = 5040).

Some test words contained the ending “-el” after the voiceless stop consonant, as in Zettel ‘slip of paper’ [ts”t§l]. While labeling the soundfiles, it was checked that the subjects produced the schwa and did not delete the latter to produce only a syllabic [l], as in [ts”tl]. If the schwa had been deleted, which was rather rare, the soundfiles were excluded from the present investigation.

In Standard High German, the voiced fricative [z] occurs word-initially and -medially, whereas the voiceless fricative [s] does not occur word-initially, apart from some loan words (cf. pilot investigation, section 4.2.1). As a consequence, the voiced alveolar fricative [z] was used word-initially, whereas in word-medial position the voiceless [s] was used. With regard to this investigation - aiming to analyze voiceless obstruents - it was assumed that the difference between [z] and [s] is characterized by the feature voicing. The influence on the results is discussed in the section 4.1, about research questions and hypothesis. The complete set of materials is listed in Appendices A and B.

4.3.2 Participants

Ten speakers of Standard High German with no remarkable difference in their dialect were recorded. The five female and five male participants reported no speech impairment. All subjects had an academic, but non-linguistic background and were not aware of the purpose of this investigation. The participants’ age ranged from 25 to 60 years.

4.3.3 Method

Two different recording tasks, (1) a reading task and (2) a sentence-composing task, were chosen in order to exclude effects elicited by the tasks themselves as the reading task is meant to induce a more formal pronunciation than the more spontaneous sentence-composing task.

For the reading task, the participants were asked to read the sentences appearing on a laptop screen at intervals of 1 sec. The word containing the target sound was always positioned in the nominal phrase, preceded by the definite articles die [di:] ‘the’

(FEM) or der [de©] ‘the’ (MASC). The German neuter definite article das [das] ‘the’

was not used at all. Finally, the whole sentence structure consisted of a definite article and its subject followed by a verb and an object. The sentence context was different for each word, for example, the contextual frame for the word Pfaffe ‘cleric’ was der Pfaffe hat Hunger ‘the cleric is hungry’ and for Zapfer ‘tapster’ der Zapfer hat Spass

‘the tapster is having fun’.

For the sentence-composing task, the subjects were asked to compose sentences with two randomly mixed target words, appearing simultaneously on the laptop screen. The words were the same as in the first task. It was emphasized that there was no need to form a meaningful sentence. The words were presented in intervals of 4 sec to gain an almost natural speaking style. An illustration sentence (that made no sense) was presented to the subjects before the session began. For example, the words Apfel

‘apple’ and Katze ‘cat’ were presented to the participants and could result in a sentence such as “der Apfel liegt auf der Katze”, ‘the apple lies on the cat’. Naturally, the responses of the participants varied greatly among the subjects, and therefore, the preceding sound or word of the target was not predictable as it depended on the subjects’ creativity.

The stimuli were presented to the subjects via Microsoft PowerPoint. The items within each recording scenario were presented randomly to avoid priming effects and to conceal the goal of the investigation. The data was recorded on a Sony DAT recorder TCD-D100 using the Sony ECM-MS 957 condenser microphone. The

microphone was placed on the left side of the laptop on a stand, at an approximately 45-degree angle, about 30 cm away from the speaker, directed to the speakers’ mouth.

Hence, the airflow was not directly hitting the membrane of the microphone. The recordings were made in various furnished rooms with little reverberation. During the recordings, background noise, for example, that of other people, traffic, doors, naturally occurred, since the aim was to collect almost natural speech data accompanied by noise, the average automatic speech recognizer has to cope with as well. Another positive effect was that the subjects acted more relaxed in their familiar environments.

A recording rate at 44.1 kHz and 16 bit resolution was chosen. The speech data was downsampled to 22.05 kHz on hard disc, using an anti-aliasing filter.