Structure of the Thesis - Automatic Detection of Prosodic Cues

blühende Blumen

H* H* L-L%

F0 [Hz]

0.1 s

blühende Blumen

H* H* L-L%

F0 [Hz]

0.1 s

H* H* L−L%

Input Processing Output

Figure 1.2: Illustration of input and output of the computer program presented in this thesis. The input is a speech signal and the output is a label file with information about the type of prosodic event (pitch accents and boundary tones according to the ToBI model, see section 3.2.1) and where it appears in time.

entities. The bottom-up procedure is the search for acoustic cues given from the course of F0 and RMS amplitude;⁷ the top-down procedure is represented by the mapping logic from abstract labels to acoustic cues.

1.2 Structure of the Thesis

The basic structure of this thesis consists of a presentation of some intonational phenomena, a literature review, a description of the method chosen for automatic detection of prosodic events, an evaluation of the method and finally a discussion.

In more detail: In chapter 2 some intonational phenomena like typical contours or text-to-tune alignment are laid out. Chapter 3 presents a literature review of the most influential theories about intonational structure or F0 modeling: theIPO theory of intonational structure (’t Hart et al. 1990),Fujisaki’sF0 model (Fujisaki

& Hirose 1982; Fujisaki 1983), theKielerIntonationModel (KIM; Kohler 1991), Taylor’sRFC-model (Rise/Fall/Connection; Taylor 1994), andPierrehumbert’s theory (Pierrehumbert 1980; Beckman & Pierrehumbert 1986). A sketch of the basic principles of the “autosegmental-metrical theory” (see Ladd 1996 and section 3.1.6) is given afterwards. The chapter also compares the presented models and presents the phonological modeling of intonation in German in more detail. Further two labeling instructions are presented in this chapter and finally some approaches about the automatic detection of prosodic events are introduced and discussed.

7Duration, as mentioned on page 14 is not measured directly but influences the steepness of F0 and RMS curves, that make up the ‘course of F0 and RMS amplitude’

1.2 Structure of the Thesis Chapter 1. Introduction In chapter 4 the outline of the automatic prosodic aligner (ProsAlign) is intro-duced. Chapter 5 describes the implementation of the model in a computer pro-gram and chapter 6 its evaluation. Finally chapter 7 summarizes the findings in this work and discusses future directions. Terminological questions will be dealt with when the term in question first appears and will be explained in footnotes.

Chapter 2

Examples of Intonational Phenomena

It is common knowledge that the way something is said can be just as important in conveying a message as the words used to say it. In order to present some exam-ples of the latter and to give an insight in the field of work the following chapter will present some of the intonational phenomena in German and also a sketch of typological aspects of intonation, since other languages might use different con-tours for the same type of sentences. According to Helfrich (1985) there are three functions of intonation that modify an utterances meaning: (i) marking of sen-tence type, (ii) focussing, and (iii) disambiguation.¹ For the first case examples of offering contours, calling contours, surprise contours as well as typical con-tours fromdeclarative and interrogative sentences are presented. Some of these examples show the effect of the overlay of one and the same intonation contour on different text material. In turn, other examples show how one and the same text material is aligned with different intonation contours. Focussingis illustrated by a question-answer example. Phrasing is outlined with an example of the same text that results in two totally different meanings according to the different subdivision into prosodically coherent units. Finally some language universal aspects of into-nation are addressed. This chapter should lay the ground for what sort of abstract information should be extracted by the proposed algorithm in chapter 4.

The following illustrations show waveforms from speech files and time aligned F0 contours as extracted by the ESPS/waves pitch trackerget_f0(version 1.14).² The procedure to extract the F0 contour from a given waveform can be roughly charac-terized as the detection of the more or less periodically repeating glottal pulses in its voiced segments. However, this is only one method of extracting F0, there are

1Crystal (1995, p. 249) describes six functions of intonation: emotional, grammatical, informa-tional, textual, psychological and indexical, see also the definitions of intonation in (Rossi, 2000, Section 2.4.3).

2See more information regarding theget_f0program on page 99.

2.2 Examples Chapter 2. Intonational Phenomena also articulatory based procedures that measure vocal fold vibration with a laryngo-graphby attaching electrodes to the neck of a speaker and also auditory perception as it was more often applied in former years. F0 is defined as the number of glottal pulses per second (= Hz). The periodicity of human sound signals is known to be not perfectly periodic therefore it is also called ‘quasi-periodic’ (see e.g. Hess 1983). Problematic aspects of the automatic F0 extraction will be addressed in chapter 5.1.³

2.1 Offering Contour

The so calledoffering contouris usually used when somebody wants to offer some-thing to somebody else as in the questions given below.

(a) Willst du Kaffee? Do you want coffee?

(b) Willst du noch mehr Vanilleeis mit Schlagsahne? Do you want some more vanilla icecream with whipped cream?

The offering contour is intonationally realized as a fairly constant beginning up to a rise at the end. The F0 contours from the offering contour examples (see figure 2.1) show that there is different sized text material aligned with what counts phonologically as one and the same contour.

Im Dokument Automatic Detection of Prosodic Cues (Seite 17-20)