• Keine Ergebnisse gefunden

The Dutch School of Intonation

Im Dokument Automatic Detection of Prosodic Cues (Seite 33-36)

2.6 Typological Aspects

3.1.1 The Dutch School of Intonation

The so called “Dutch School of intonation” is an attempt started in the early six-ties at the “Instituut voorPerceptieOnderzoek” (IPO) and is summarized 1990 in

’t Hart et al. (1990) to describe intonation from a perception point of view. The researchers started with the description of Dutch intonation and applications of their model to other languages have been also conducted (Russian: Keijsper 1983;

German: Adriaens 1991). The basic assumption underlying their research is as follows:

“[...] that only those F0 changes would be regarded as possible candidates for a descriptive model of pitch for which a link could be established with commands to the vocal-cord mechanism, which as such are under the speaker’s control” (’t Hart et al., 1990, p. 186).

Although this is an articulatory based assumption and the authors are stating ex-plicitly that physiological measurements should be made, they also state that “such a method has a number of unattractive aspects” (ibid, p. 39). The number of speak-ers would be restricted to those who want to volunteer for such experiments, also the authors doubt that under this experimental circumstances spontaneous speech could be recorded. However, the articulatory based assumption is said to have a consequence, namely “that the involuntary fluctuations do not make an essential contribution to the perception of the speech melody: their omission [...] should

1Autosegmental phonology was originally invoked by Goldsmith (1976) and contrasts with strictly segmental theories of phonology. In traditional segmental phonology a representation con-sists of a linear arrangement of segments. Whereas in autosegmental phonology a representation consists of several ‘tiers’, each tier including a linear arrangement of elements which are linked to each other by association lines. See e.g. Goldsmith (1976), Gussenhoven & Jacobs (1998).

3.1 Discussion of Intonation Models Chapter 3. Literature Review

Figure 3.1:Illustration of “close-copy” stylization from ’t Hart et al. 1990, p. 43). The F0 values are depicted on a logarithmic scale.

not bring about any substantial change in the perceived speech melody” (’t Hart et al. 1990, p. 40). This view expresses the observation that not all details in the F0 contour are relevant in perception. Therefore the central question in the IPO the-ory was: what are the perceptually relevant pitch movements? The strategy to find these relevant pitch movements is based on two steps:stylizationand standardiza-tion. In the process ofstylizationa F0 contour is taken and straight lines are drawn to fit to the original contour (“close-copy”; see figure 3.1). The stylized contour has to be perceptually equal to the original one. This is tested by re-syntheses of the stylized contour and comparing it to the original contour. However, since the quality of speech synthesis was at that time not as advanced as it is nowadays it does not seem to be a convincing procedure. The process ofstandardization(see figure 3.2) involves the adaptation of the stylized contour to a grid of three contin-uously decreasing lines (L(ow), M(id), H(igh); representing the declination effect2 under the criterion of perceptual equivalence to the close-copy representation.

This approach describes intonation contours with series of straight lines falling between the three declination lines. Pitch accents are represented by rises and falls between these declination lines. The procedure includes a clear reduction of information at each stage from the continuously varying F0 contour up to the standardized intonation patterns.

2Declination is usually understood as the slight fall of pitch during the beginning and end of an intonation phrase. It is however, far from being uncontroversial. The question under dispute here is whether it is an actively controlled process or resulting from other processes (see e.g., Hirst &

Di Christo 1998, p. 21)

Chapter 3. Literature Review 3.1 Discussion of Intonation Models

Figure 3.2:Illustration of “standardization” in the IPO approach (from ’t Hart et al. 1990, p. 49). The F0 values are depicted on a logarithmic scale.

In the book published in 1990 (’t Hart, Collier and Cohen) the authors are criticiz-ing the ‘levels’ approach (under which they subsume the approaches by Ladd 1983 and Pierrehumbert 1980 and others) several times. The main line of criticism is that the author reject the view that “the speaker primarily intends to hit a particular pitch level and that the resulting movements are only the physiologically unavoid-able transitions between any two basic levels” (’t Hart et al., 1990, p. 75). They

“believe that the use of ‘levels’ in a phonetic analysis of intonation is an oversim-plification. And even though it may be a commendable attempt at phonological data reduction, its application on the phonetic level runs counter to the phonetic facts of pitch-change production and perception” (ibid., p. 77).

Taylor (1994) criticized the IPO-approach as follows:

“The Dutch system uses three rigidly defined levels, and therefore has problems dealing with any sort of downstep [see explanation on page 57, NB]. This strict three level distinction also poses problems with changing the pitch range or describing accent prominence [...].

The phonetic, intermediate level is incapable of expressing all the necessary distinctions between downstepping and non-downstepping contours. [...] Thus the F0-intermediate and intermediate-F0 mapping are not the analysis and synthesis equivalents of each other [...].

The fault in this case lies with “forcing” the F0 contour to be ana-lyzed in terms of the three line declination system. If there is a large discrepancy between the behavior of real F0 contours and what the model proposes, then the model will run into severe difficulties. [...]

3.1 Discussion of Intonation Models Chapter 3. Literature Review The model will have difficulty analyzing any contour that is not within its own legal set” (Taylor, 1994, p. 27).

To what extent is the IPO approach useful for automatic detection of prosodic events? The approach is a possible technique to map the F0 level to more ab-stract intonational entities. However, the objections made by Taylor are crucial and before one attempts to implement the model in a fully automatic procedure a number of other questions remain to be solved, for instance how one can fit straight lines automatically with the same reliability as human labelers do it to the F0 curve.

How can one further automatically process the stylized contour into a standardized contour?

Furthermore the stylisation with straight lines does not provide a level of abstrac-tion as does a phonological model and can be criticized in this respect. The styli-sation is more a sort of data reduction, whereas a phonological model enables one to structure acoustic observations and systematically explore patterns within those;

that is, abstracting from the particular acoustic realization.

Im Dokument Automatic Detection of Prosodic Cues (Seite 33-36)