• Keine Ergebnisse gefunden

Visualization of Beat Timing in Time Delay Plots

3.4 Conclusion

4.1.2 Visualization of Beat Timing in Time Delay Plots

The next question is how variability can be quantified and visualized straightfor-wardly. Many of the metrics discussed in Chapter 3 were criticized mainly based upon the fact that they only show a global tendency but are difficult to interpret. It is also unclear whether variability ought to be sought for using the local variabil-ity as in the PVI-metric or rather the global variabilvariabil-ity indicated by metrics based on dispersion. Metrics concentrating on vocalic variability have proved to give the most reliable results. However, in Section 2.2.3 it was argued that the domain of a beat in speech is likely to be the syllable. A description concentrating on vocalic intervals only disregards any syllabic consonants which may constitute a substan-tial number of beats in many languages such as English, German or Czech. Thus, an adequate description of beats in speech ought to search for syllables (or possibly morae, cf. Section 2.2.4) rather than vocalic intervals. Of course, the measurement of syllable duration does not provide us with an adequate correlate of the duration of a rhythmically relevant event, since the perceptual onset of a syllable is not iden-tical to the beginning of the syllable itself. In order to estimate this beginning, the p-centerof the syllable needs to be determined. The p-center depends on the dura-tion of the onset and is roughly estimated with the equadura-tion proposed by Marcus (1981) (cf. Section 2.2.3). Since rhythm has to do with timing it would be practical if we could assign the beat a duration as a rough estimate of its strength or promi-nence. It has often been practiced to regard the interval durations between subse-quent perceptual centers as equivalent to the temporal extension of beats. If this

practice is transferred to speech, though, the onset of a subsequent syllable would enter the perceived strength of the previous beat. While I do not want to rule out the possibility of such an approach as being adequate, I prefer not to include the onset of a subsequent syllable in the beat duration of the previous one. At least for experienced listeners of a language, I assume for now that a syllable ending is also perceptually interpreted as the end of a beat. The coda consonants, however, ought to be included in the estimate of the beats’ durations, since we know from phono-logical analyses (cf. Section 2.2.4) that the phonological weight, i.e. the perceptual strength of a syllable, is determined by the entire syllable rhyme. Thus, the syllable coda might play an important role in the perception of a rhythm and should not be left out in rhythmic analyses. Thus, in the subsequently presented method, “beats”

or “beat durations” refer to the duration in between a syllables p-center (determined according toMarcus(1981)) and the end of the syllable’s rhyme.

Rhythm metrics such as PVI and those based on dispersion (cf. Section 3.1) search for local and global variability in speech which could successfully be linked to typological rhythm classes. It would be a plus, if both types of temporal variability could be expressed simultaneously. A useful visualization tool for such an approach can be found in methods used in Nonlinear Time Series Analysis. Nonlinear Time Series Analysis have become increasingly popular for the description of determin-istic, nonlinear data, especially in the description of economic and biomechanical data such as the detection of pathological vocal fold vibration (Little (2006)). The relative difficulty of obtaining significant autocorrelation at lags > 1 (cf. Sections 2.4.1 and3.2.1) indicate further that linear techniques may be only partly useful in the detection of underlying regularity in speech rhythm data. For the moment be-ing, Nonlinear Time Series Analysis will only be used in descriptive fashion. More precisely, time delay visualization, will be used in order to detect possible hidden regularities in rhythmic speech data. The idea is simple and has been used in physics in order to detect the behaviour of nonlinear, deterministic event series such as wa-ter dripping (e.g. Baier(2001)): By plotting the duration of a fundamental rhythmic eventi on the x-axis vs. a subsequent fundamental rhythmic eventi+ 1on the y-axis in a so-called time-delay cross-plot (cf. Figure 4.1), the relative timing of two subsequent rhythmic events atlag1 is expressed (cf. alsoWagner (2006, 2007)). In order to compare durations across different speakers, speaking rates or speaking

styles, all beat durations have been z-normalized2, resulting in values where “0” indicates the mean duration and “−1” and “1” indicate the negative and positive standard deviation of the data series. Obviously, such a normalization factors out potentially important tempo-related effects of rhythm, such as the hypothesis that rate correlates negatively with perceived variability (cf. Section2.3.5.1). Thus, such a visualization cannot be sufficient for a full exploration of speech rhythm. The fol-lowing properties of a rhythmical event sequence can be identified in the proposed cross-plot:

• A tendency towards global isochrony across the data set, when the data plots cluster around the {0,0}-co-ordinate, i.e. the point where two subsequent mean durations are indicated (cf. Figure4.1).

• A tendency towards local isochrony as measured by the PVI, i.e. two sub-sequent similar beat durations is indicated in the plot whenever beatduri ≈ beatduri+1, thus creating a point along the diagonal (cf. Figure4.1).

• Typical relative durations across the data set clustering in particular areas of the two-dimensional space indicating rhythmically relevant transitions in the rhythmic pattern, e.g. they may indicate the beginnings or endings of groups, due to a local increase in time.

• A tendency for strict alternation in the rhythmic pattern. A tendency towards a regular pattern of long-short-long-short events is indicated by one cluster in the upper left (short-long) quadrant of the plot and another cluster in the lower right (long-short) quadrant (cf. Figure4.2).

• Subsequent very long or subsequent very short events, indicating pronounced

2The z-normalized valuezifor any valuexi contained in a data seriesS =x1, x2, x3. . .is calcu-lated aszi = xσ(S)iS¯, withS¯expressing the mean of the data series andσ(S)expressing the standard deviation of the data series. A problem is that the z-normalization is only useful if applied to nor-mally distributed data. Beat durations, however, always show a significant amount of curtosis, since they certainly have a minimal duration but can become very long. Therefore, the following calcu-lations have also been repeated based on log-transformed data showing a normal distribution, as suggested byDellwo(2008a). The outcome of these calculations did not differ categorically from the ones presented here.

lengthening or reduction phenomena across two consecutive syllables (cf. Fig-ure4.2).

Figure 4.1: The time-delay plot representing the duration of a rhythmical evention the x-axis and a subsequent rhythmical eventi+ 1on the y-axis. Subsequent similar or equal durations as measured by the PVI are plotted around the diagonal. Global isochrony ought to show as a cluster concentrat-ing around the{0,0}-coordinate.

Relative durations also indicate the kind of grouping that is expressed by the in-crease or dein-crease in duration: The investigation of temporal patterns distinguish-ing iambic and trochaic groupdistinguish-ing by Br ¨oggelwirth(2007) and our own pilot study reported in Section 2.3.1 clearly indicated that strong relative increase in duration tends to be interpreted as final lengthening, the ending of a group, while a moder-ate relative increase in duration is interpreted as the beginning of a group. Thus, the plot should also indicate wether the transition between beats signals the end or the beginning of a local or global rhythmic pattern (cf. Figure4.3).

Another possibly important timing feature that is visualized in time delay plots concerns general tendencies of deceleration that have been found to be important in earlier studies (cf. Section 3.2.2). Each local decelerating transition will appear above the diagonal, accelerating trends will appear below it.

Figure 4.2: In time-delay plots, strict alternations show as two clusters, one in the lower right and one in the upper left quadrant, while sequences of short, reduced events show in the lower left, sequences of long events, show in the upper right of the diagramme.

Figure 4.3: Final lengthening indicating the end of a group (= iambic lengthening) should plot in higher regions than lengthening effects indicating beginnings (= trochaic lengthening).

Figure 4.4: The location of deceleration and acceleration tendencies in time delay plots.

Of course, it is relatively uninteresting to get a picture of the various different temporal relations between successive beats without being able to relate these to their communicative function. In the visualization method proposed here, different colors are chosen in order to mark fuctionally different transitions between beats.

Three types of functionally different transitions were chosen:

• Transitions to phrase final beats are highlighted in red. Transitions across phrase boundaries are ignored.

• Transitions to lexically stressed beats are highlighted in blue. (This includes transitions from unstressed to stressed as well as in between stressed beats.)

• Transitions to unstressed beats are highlighted in green. (This includes transi-tions from stressed to unstressed as well as in between unstressed beats.) The following plots in Figure4.5 show the differences for our two example lan-guages English and French. The language material is taken from the BonnTempo database (Dellwo et al. (2004)). For each language, material from three speakers reading the same text passage in 5 different speech rates (approximately 450 sylla-bles per speaker) is plotted.

The plots show clear differences between the protoypically stressed timed En-glish and syllable timed French. For EnEn-glish, the plots shows a clear separation be-tween lexically stressed and unstressed syllables that is also indicating a high degree of rhythmical alternation. Final lengthening is in most cases more pronounced than lengthening of lexically stressed syllables. Final lengthening is not always confined to the last syllable, in some cases the penultimate syllable is lengthened as well — probably due to coocurrance with lexical stress. In French, the overlap between tran-sitions to stressed and unstressed syllables is striking — there is hardly any differ-ence. Only final lengthening is clearly apparent in French, undermining its status as having no clearly durationally marked lexical stress with the help of a local increase induration. However, a highlighting of the end of phrases or stress groups is evi-dent. Also, French relative durations cluster markedly around the{0,0}-coordinate, showing a lack of durational variation in comparison with English. In total, we see that the predictions made by phonological theory could be confirmed in time de-lay plots. French lacks durational variation globally and hardly ever marks lexical

Figure 4.5: The graphs show relative beat durations in English (left) and French (right). Transitions to lexically stressed beats are marked in blue (+), transitions to unstressed ones in green (o), transitions to phrase final beats in red (x).

stress. In comparison to the plots produced by popular global rhythm measures described in Chapter 3, time-delay plots have the advantage to be directly inter-pretable along similar rhythm-related dimensions as have been detected in typolog-ical analyses. Of course, they have the advantage over phonologtypolog-ical analyses that they are not limited to categorical classification. Instead, they are able to show very fine-grained timing differences on a continual scale.

Our exploratory study can go further, however — that lack of acoustic marking of lexical boundaries by an increase in duration in French is well-known — this lack certainly stands in sharp contrast to the common notion of French being “iambic”, since iambic lengthening usually would require a pronounced lengthening effect.

Still, it is possible that word boundaries are still marked in French and our graph simply did not highlight the correct type of transition. Therefore, we alter the plot and instead of highlighting transitions to word final syllables, those from word fi-nal syllables to presumably unstressed syllables are plotted. The resulting graph is shown in Figure4.6. Here, we clearly see that a lengthening effect does take place

— albeit relative to the following very short beat in the upcoming foot. Thus, the intuitive impression of French both having final lengthening at word boundaries and the empirical finding that is does hardly show word final lengthening are both true. A pronounced lengthening does not take place within the foot or group but relative to the upcoming one. It is interesting that the remaining transitions are con-centrated left of the diagonal. This indicates a general tendency of lengthening in French propagating throughout the foot and is limited by lexical boundaries.