• Keine Ergebnisse gefunden

Speech and speaker normalization (in vowel normalization)

N/A
N/A
Protected

Academic year: 2021

Aktie "Speech and speaker normalization (in vowel normalization)"

Copied!
33
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Speech and speaker normalization (in vowel normalization)

Venice International University

Phonetic and technological aspects of speaker characteristics Prof. Dr. J. Harrington

Presented by Clara Tillmanns

(2)

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and

instruments for vowel normalization 3. Theories

(3)

Definition

Normalization.

We know there is extensive variation in speech.

How come that listeners agree in their perception of vowels?

(4)
(5)

Definition

Normalization.

Which information influences this decision?

(6)

Definition

Normalization.

(7)

Contents

1. Speech and speaker normalization: definition

2. Influencing parameters and instruments for vowel normalization

- Context

- Formant ratio - F0

- Visual information - Auditory gestalts

3. Theories

4. Studies: Johnson 1990 and 1999

(8)

Influencing parameters and

instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

(9)

Influencing parameters and

instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Visual information

Syllable external Syllable internal

Auditory gestalts

(10)

Influencing parameters and

instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Syllable external Syllable internal

Vocalic Prosodic

(11)

Influencing parameters and

instruments for vowel normalization

Context:

Perceived vowel quality is influenced

- by the formant frequencies of context vowels (Ladefoged & Broadbent 1957)

- by the F0 range of the carrier phrase (Johnson 1990) Tones: Pitch range of a context utterance influences

Mandarin Chinese tones (Leather 1983)

(12)

Influencing parameters and

instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Syllable external Syllable internal

Vocalic

Prosodic Gender

Relative patterns

(13)

Influencing parameters and

instruments for vowel normalization

Formant ratio

Vowels are relative patterns - no absolute frequencies

(14)

Influencing parameters and

instruments for vowel normalization

Formant ratio

(15)

Influencing parameters and

instruments for vowel normalization

F0

Miller 1953

doubled F0 and found vowel category shift for most American English vowels

Fujisaki & Kawashime 1968:

Found F1 boundary shifts from 100Hz to 200Hz for F0 shifts of 200Hz

(16)

Influencing parameters and

instruments for vowel normalization

Extrinsic Intrinsic

Context Formant ratio

F0

Syllable external Syllable internal

Vocalic

Prosodic Gender

Relative patterns

(17)

Influencing parameters and

instruments for vowel normalization

Visual information

- Gender: boundary shift much like the F0 shift (Strand & Johnson 1996)

- Age

- Vowel quality: boundary shift through differing visual phonetic information (Johnson 1999)

- Sociocultural: Speech intelligibility is reduced, when the voice is associated with an Asian

looking face (Rubin 1992)

(18)

Influencing parameters and

instruments for vowel normalization

Auditory gestalts - “secondary cues”

Duration

Formant frequency movement trajectories:

- Lehiste & Metzger 1973:

- Fixed duration vowels synthesized with steady-state formant frequencies (51% correct)

- mixed lists of the original vowels from men, women and children 79% correct.

(19)

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and instruments for vowel normalization

3. Theories

3.1 Vocal tract normalization (VTN) 3.2 Talker normalization (TN)

4. Studies: Johnson 1990 and 1999 5. Recapitulation

(20)

Theories - VTN

Vocal tract normalization theories

consider that listeners perceptually evaluate vowels on a talker specific coordinate system.” (Johnson 2004)

• Context vowels (reference)

• Visual information about the size of the

(21)

Theories - VTN

But: Talkers may differ from each other at the level of their articulatory habits of speech:

“Perception may not be able to depend on vocal tract normalization to “remove” talker

differences by removing vocal tract differences.” (Johnson 2004)

 Speaker/speech variation depends on anatomical differences only?

(22)

Theories - VTN

Cross-linguistic gender differences

Bladon, Henton and Pickering (1984):

The difference between men and women vary from language to language.

 Cultural factors are involved in defining and shaping male or female speech

(23)

Theories - VTN

Fig. 3 Spectral shift

needed to normalize male and female spectra

From Bladon, Henton &

Pickering (1984)

(24)

Theories - VTN

“This seems to suggest that talkers choose

different styles of speaking as social, dialectal gender markers.

A speaker normalization that removes vocal tract differences will fail to account for the linguistic categorical similarity of vowels that are different due to different habits of

(25)

Theories - TN

Talker normalization is subject to expectations:

Magnuson & Nusbaum (1994) compared

1-voice with 2-voice instructions in a mixed-talker and blocked- talker experiment.

Advantage of blocked-talker disappeared when subjects didn’t know about the different F0s of the two voices.

Talker normalization is an active process:

Kato & Kakehi (1988) Listener adaptation to talker voice:

Increase in recognition accuracy over the course of 5 stimuli presented in noise

(26)

Theories - TN

“In this approach, cognitive categories are represented as collections of the stored cognitive representations of experienced instances of the category,

rather than as normalized abstract representations from which category-internal structure has been removed”

(Johnson 2004)

(27)

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and instruments for vowel normalization

3. Theories

4. Studies

4.1 Johnson 1990 4.2 Johnson 1999

5. Recapitulation

(28)

Studies

“The role of perceived speaker identity in F0 normalization of vowels” (Johnson 1990)

Presentation of vowels from a “hood”-”hud”

continuum in two different intonational contexts which were judged to have been produced by different speakers, even

(29)

Studies

“The role of perceived speaker identity in F0 normalization of vowels” (Johnson 1990)

Shift in identification as a result of the intonational context

which was interpreted as evidence for the role of perceived speaker identity in vowel normalization

(30)

Studies

“Auditory-visual integration of talker gender in vowel perception” (Johnson 1999)

Exp. 1 found, that the gender of auditory-visually presented stimuli shift the phoneme boundary of a vowel continuum

Exp. 2 found that visual phonetic information is integrated in the boundary shift

(31)

Contents

1. Speech and speaker normalization in vowel normalization: definition

2. Influencing parameters and instruments for vowel normalization

3. Theories

4. Studies: Johnson 1990 and 1999

5. Recapitulation

(32)

Recapitulation

- Great internal and external influence on the perception (of vowels)

- Explanation must integrate repeated learning - Information on speaker identity influences the

perception (of vowels)

- But: Is the perception of speaker identity influenced by certain components of the

(33)

References

Bladon, R.A., Henton, C. G. & Pickering, J. B. (1984) Towards an auditory theory of speaker normalization. Language Communication 4, 59-69.

Fujisaki, H. & Kawashima, T. (1968) The roles of pitch and higher formants in the perception of vowels. IEEE Transactions on Audio and Electroacoustics AU-16, 73-77.

Hillenbrand, J. M. & Neary, T. M. (1999) Identification of synthesized /hVd/ utterances: Effects of formant contour. J.

Acoust. Soc. Am. 105, 3509-3523.

Ladefoged, P. & Broadbent, D. E. (1957) Information conveyed by vowels. J. Acoust. Soc. Am. 29, 98-104 Leather, J. (1983) Speaker normalization in the perception of lexical tone. Journal of Phonetics 11, 373-382

Lehiste, I. & Metzger, D. (1973) Vowel and speaker identification in natural and synthetic speech. Language and Speech 16, 356-364.

Johnson, K., Strand, E. A. & D’Imperio, M. (1999) Auditory-visual integration of talker gender in vowel perception. Journal of Phonetics 27, 359-384

Johnson, K. (2004) Speaker normalization in speech perception. Ohio State University

Johnson, K. (1990) The role of percieved speaker identity in F0 normalization of vowels. J. Acoust. Soc. Am. 88 642-654 Kato, K & Kakehi, K. (1988) Listener adaptability to individual speaker differences in monosyllabic speech perception. J.

Acoust. Soc. Of Japan 44, 180-186

Magnuson, J. & Nusbaum, H. (1994) Are representations used for talker identification available for talker normalization?

Proceedings of the International Conference on Spoken Language Processing.

Miller, R. L. (1953) Auditory tests with synthetic vowels. J. Acoust. Soc. Am. 25, 114-121.

Peterson, G. E. & Barney, H. L. (1952) Control methods used in the study of vowels. J. Acoust. Soc. Am. 24, 175-184 Rubin, D. L. (1992) Non-language factors affecting undergraduates’ jedgements of non-native English-speaking teaching

assistants. Research in Higher Education 33, 4.

Strand, E. A. & Johnson, K. (1996) Gradient and visual speaker normalization in the perception of fricatives. In Natural languag processing and speech technology: results of the 3rd KONVENS conference, Bielefeld, (D. Gibbon, Ed.),

Referenzen

ÄHNLICHE DOKUMENTE

Abbreviations: BDI, Beck depression inventory; MDS-UPDRS, Movement Disorders Society unified Parkinson’s disease rating scale; PDCB, Parkinson’s disease caregiver burden

However, we do not think that the ending vowels of them express no morphological features, since at least in Florentine Italian these two dative pronouns tend to undergo VE

Therefore, suffix vowels do not change their quality through intervening suffixes that are subject to a different harmony pattern as with Turkish case suffixes with an

Predicting Automatic Speech Recognition Performance over Communication Channels from Instrumental Speech Quality and Intelligibility Scores.. Laura Fern´andez Gallardo 1 ,

Schröder and Grice have made a first step towards a limited control of voice quality [3]: For a male and a female voice, they recorded a full German diphone set with three levels

The second voice quality transformation method uses a direct frame mapping approach in which a more detailed vocal tract transformation function

Conclusion: Therefore, the quality of toothache-related information found in this sample of Brazilian websites was classified as simple, accessible and of poor quality, which can

black dots display average sender strategy for each agent and vowel category). colored dots display receiver strategies (colors represent