• Keine Ergebnisse gefunden

2.2 Experiment

2.2.2 Results

Results ofF0

Sumimasen:The interrater reliability score for the utterances produced by the L1 speak-ers had a Kappa of 1.00 (SE = 0), and that produced by the L2 speakspeak-ers 0.59 (SE = 0.08, 95% Confidence Interval, henceforth CI = [0.44, 0.74]). The former Kappa value signals an extraordinarily high level of agreement, while the latter shows a moderate level (Lan-dis and Koch, 1977). In case of a (Lan-disagreement between two coders, the results from the Japanese L1 coder are reported as the main source. Disagreements were found in the perception of downsteps and upsteps or in the perception of two different pitch accents or boundary tones.

All L1 speakers’ utterances were coded as HHHHL or LHHHL with an initial low (Vance, 1987), a typical form of Japanese utterances. L1 speakers changed neither pitch accents nor boundary tones across repetitions. In the L2 speakers’ data, twenty ut-terances were coded as HHHHL or LHHHL, the forms produced also by the L1 speak-ers. Further twelve utterances were coded as HHHHH, which were flat contours. Then, there were utterances in which the lexical pitch fall occurred before the mora se, and which ended with a falling boundary tone: Six utterances were coded as HHLLL, three as HHHLL and one as HLLLL. Other utterances had a rising end: One utterance was coded as HHHLH, two utterances as LLLLH, see Figure 2.5. Contrary to the Japanese L1 speak-ers’ data, German L2 speakers showed variations both in pitch accents and final bound-ary tones. Noticeably, rising contours were found only in the first or second attempts.

Considering the changes of a pitch accent or a final boundary tone implemented by the same speaker across repetitions, twelve L2 speakers varied either a pitch accent (seven of them) or a boundary tone (twelve of them).

Konnichiwa: The inter-rater reliability score for the utterances produced by the L1 speakers had a Kappa of 1.00 (SE = 0) and that produced by the L2 speakers 0.64 (SE =

44 Coordinating lexical and paralinguistic use ofF0in L2 production

Figure 2.5Number of occurence of the contours for the wordsumimasenin each attempt produced by the L2 speakers. The legend (1,2,3) refers to the number of attempt.

0.09, 95% CI = [0.46, 0.82]). As it was the case forsumimasen, the former Kappa value shows an extraordinarily high level of agreement, while the latter indicates a moderate level (Landis and Koch, 1977).

Figure 2.6Number of occurence of the contours for the wordkonnichiwain each attempt produced by the L2 speakers. The legend (1,2,3) refers to the number of attempts.

2.2 Experiment 45 All L1 speakers’ utterances were coded as either HHHHH or LHHHH with an initial low. As it was case forsumimasen, L1 speakers changed neither pitch accents nor bound-ary tones across repetitions. In the L2 speakers’ data, sixteen utterances were coded as LHHHH, i.e. the form produced by the L1 speakers. Utterances with an incorrect pitch fall and a falling final boundary tone such as HHHHL (N=10), HHHLL (N=3), HHLLL (N=4) or HHHHL (N=4) were detected as deviant forms. Furthermore, eight utterances showed a rising contour; HHHHH, see Figure 2.6. Note that rising contours were found only in the first or second attempts. Both the standard form and the deviant forms oc-curred in all attempts.

Entschuldigung: The inter-rater reliability score showed a Kappa of 0.86 (SE = 0.06, 95% CI = [0.73, 0.97]). Thirty-five utterances were coded as HL-%. The other contours had rising final boundary tones; six with L and seven with H, see Figure 2.7. Rising contours occurred more frequently in the first attempt, followed by the second, and never occurred in the third attempt.

Figure 2.7Number of occurence of the contours for the wordEntschuldigungin each at-tempt produced by the L1 speakers. The legend (1,2,3) refers to the number of atat-tempts.

Results of the total durations

Due to the small number of the samples, I will report descriptive mean values and 95% CI error bars instead of running statistical analyses that require a larger number of samples.

46 Coordinating lexical and paralinguistic use ofF0in L2 production Additionally, I will also report the Cohen’sdas a measurement of effect size (Cohen, 1969, 1992), since the amount of the data was small and an effect size would help more to interpret the data. By convention, an effect size is small whend= 0.2, medium whend

= 0.5 and large when (d ≥0.8). The plot with inferential error bars includes essentially all the information provided by a hypothesis-testing procedure plus a graphic signal of how much uncertainty is in the data. Regarding a between-subject variable, the CIs of two groups that do not overlap or just touch indicate a population difference, and p is approximately less than .01. If the CIs overlap by no more than half of the length of one whisker of the CI, there is a degree of evidence of a difference, and p is approximately less than .05. However, the data may be interpreted without invoking p-values (Cumming, 2011, 13).

Figure 2.8Mean total durations for sumimasen with 95% CI bars for each attempt and language group (left) and mean total duration differences between the 1st and 3rd for the word sumimasen with 95% CI bars for each language group (right).

2.2 Experiment 47 Sumimasen:In order to provide an overview of the data, the right plot in Figure 2.8 shows mean total durations and 95% CI bars for each attempt in each group. The result shows that there was no difference between the L1 and L2 speakers’ utterance durations across the attempts (M = 0.88 s, 95% CI [0.82, 0.94] for the L1 speakers andM = 0.84 s, 95% CI [0.77, 0.92] for the L2 speakers,d= -0.17). At first sight, the plot seems to show the tendency that only the L1 speakers produced longer durations in the repetitions, so that an interaction betweenlanguage groupandnumber of attemptcould be expected. Note that the overlap of the separate CIs is irrelevant with a repeated measure so that the CI on the difference needs to be calculated in order to interpret the difference (Cumming, 2011). To this end, the differences in total durations between the 1st and 3rd attempt (3rd - 1st) were calculated, see the right plot in Figure 2.8. The plot shows that only the L1 speakers produced longer durations in the repetitions, while the L2 speakers did not, because the 95% CIs for the L1 speakers do not include 0. Moreover, there was a tendency for an interaction betweenlanguage groupandnumber of attempt(1st vs. 3rd) (M= 0.17 s, 95% CI [0.05, 0.30] for the L1 speakers andM = 0.03 s, 95% CI [-0.05, 0.12] for the L2 speakers, d = 0.73). The relatively large effect size also supported that the two groups shown in the plot tended to differ.

Konnichiwa: In the same way as for the analyses ofsumimasen, the left plot in Fig-ure 2.9 provides an overview of the data showing mean total durations and 95% CI bars for each attempt in each group. The plot indicates that only the L1 speakers produced longer durations in the repetitions, so that an interaction betweenlanguage groupand number of attemptis expected.

Analogue tosumimasen, the differences in the total durations between the 1st and 3rd attempt (3rd - 1st) were calculated, see the right plot in Figure 2.9. The plot shows that only the durational difference for the L1 speakers does not cross 0. This means that only the L1 speakers produced longer durations in the repetitions. Moreover, there was an interaction between language group and number of attempt (1st vs. 3rd), because about half of the length of the whisker of the CIs overlapped (M = 0.28 s, 95% CI [0.08, 0.48] for the L1 speakers andM= 0.10 s, 95% CI [-0.03, 0.21] for the L2 speakers,d= 0.73).

The large effect size also shows that the two groups shown in the plot differed.

Entschuldigung:The data set forEntschuldigungcontained only the German partic-ipants’ data. In order to analyse whether they produced longer utterances in the repeti-tions, the differences between the 1st and 3rd attempt were calculated. It was found that the differences were consistently larger than 0, suggesting that the German participants

48 Coordinating lexical and paralinguistic use ofF0in L2 production

0.6 0.7 0.8 0.9 1.0 1.1

1 2 3

number of attempt

total durations (in second)

language group

L1 L2

total durations

0.0 0.1 0.2 0.3 0.4 0.5

L1 L2

language group

total duration dif. (in second)

total duration difference

Konnichiwa

Figure 2.9 Mean total durations for konnichiwa with 95% CI bars for each attempt and language group (left) and mean total duration differences between the 1st and 3rd for kon-nichiwa with 95% CI bars for each language group (right).

produced longer utterances in the 3rd attempt than in the 1st attempt (M = 0.14 s, 95%

CI [0.03, 0.24]).

Results of the production of a nonnative geminate

In order to investigate whether the L2 speakers produced the nasal geminate in the target wordkonnichiwa as long as the L1 speakers, the relative segmental durations of <ko>,

<n:i>2, <chi> and <wa> with respect to the total durations were analysed. The

mea-2 The analysed segment <n:i> consists of two morae <n> and <ni> in Japanese. While the Japanese L1 speakers produced the boundary between <n> and <ni> clearly, the German L2 speakers did not. In

2.2 Experiment 49 surement of relative durations is more suitable than the one of absolute durations in this study, because the utterances were lengthened in the repetitions and only relative durations can provide us rate-independent information (Idemaru and Guion-Anderson, 2010). Mean relative durations of <ko>, <n:i>, <chi> and <wa> with 95% CI bars for each language group are shown in Figure 2.10.

0.00

Figure 2.10Mean relative durations of <ko>, <n:i>, <chi> and <wa> with 95% CI bars for each language group.

order to keep the consistency in the number of boundary of the annotation, I decided not to put a boundary between <n> and <ni>.

50 Coordinating lexical and paralinguistic use ofF0in L2 production The plot shows that the L2 speakers produced the preceding mora (<ko>) (= one mora before the nasal geminate) shorter than the L1 speakers (M = 0.17 s, 95% CI [0.16, 0.18]

for the L1 speakers andM = 0.13 s, 95% CI [0.12, 0.14] for the L2 speakers,d= -1.26). On the contrary, the following mora (<chi>) (= one mora after the geminate) was produced longer by the L2 the speakers than by the L1 speakers (M = 0.13 s, 95% CI [0.12, 0.14] for the L1 speakers andM = 0.19 s, 95% CI [0.18, 0.20] for the L2 speakers,d= 2.08). Both CI bars anddconfirm a great difference between the L1 and L2 speakers’ durations. It can be assumed that the L2 speakers produced “penultimate stress” as they are used to doing so in their L1, German. Notably, the relative durations of the nasal geminate itself

<n:i> did not differ between the two language groups (M = 0.21 s, 95% CI [0.19, 0.22] for the L1 speakers andM = 0.19 s, 95% CI [0.18, 0.21] for the L2 speakers,d= -0.28). Also the relative durations of <wa> did not differ between the two language groups (M = 0.44 s, 95% CI [0.42, 0.47] for the L1 speakers andM = 0.43 s, 95% CI [0.40, 0.47] for the L2 speakers,d= -0.14). It is remarkable that the relative durations of the preceding and of the following mora between the L1 and L2 speakers’ groups differed from each other, but those of the segment containing the nasal geminate itself did not.