Speech and speaker normalization (in vowel normalization)
Venice International University
Phonetic and technological aspects of speaker characteristics Prof. Dr. J. Harrington
Presented by Clara Tillmanns
Contents
1. Speech and speaker normalization in vowel normalization: definition
2. Influencing parameters and
instruments for vowel normalization 3. Theories
Definition
Normalization.
We know there is extensive variation in speech.
How come that listeners agree in their perception of vowels?
Definition
Normalization.
Which information influences this decision?
Definition
Normalization.
Contents
1. Speech and speaker normalization: definition
2. Influencing parameters and instruments for vowel normalization
- Context
- Formant ratio - F0
- Visual information - Auditory gestalts
3. Theories
4. Studies: Johnson 1990 and 1999
Influencing parameters and
instruments for vowel normalization
Extrinsic Intrinsic
Context Formant ratio
F0
Influencing parameters and
instruments for vowel normalization
Extrinsic Intrinsic
Context Formant ratio
F0
Visual information
Syllable external Syllable internal
Auditory gestalts
Influencing parameters and
instruments for vowel normalization
Extrinsic Intrinsic
Context Formant ratio
F0
Syllable external Syllable internal
Vocalic Prosodic
Influencing parameters and
instruments for vowel normalization
Context:
Perceived vowel quality is influenced
- by the formant frequencies of context vowels (Ladefoged & Broadbent 1957)
- by the F0 range of the carrier phrase (Johnson 1990) Tones: Pitch range of a context utterance influences
Mandarin Chinese tones (Leather 1983)
Influencing parameters and
instruments for vowel normalization
Extrinsic Intrinsic
Context Formant ratio
F0
Syllable external Syllable internal
Vocalic
Prosodic Gender
Relative patterns
Influencing parameters and
instruments for vowel normalization
Formant ratio
Vowels are relative patterns - no absolute frequencies
Influencing parameters and
instruments for vowel normalization
Formant ratio
Influencing parameters and
instruments for vowel normalization
F0
Miller 1953
doubled F0 and found vowel category shift for most American English vowels
Fujisaki & Kawashime 1968:
Found F1 boundary shifts from 100Hz to 200Hz for F0 shifts of 200Hz
Influencing parameters and
instruments for vowel normalization
Extrinsic Intrinsic
Context Formant ratio
F0
Syllable external Syllable internal
Vocalic
Prosodic Gender
Relative patterns
Influencing parameters and
instruments for vowel normalization
Visual information
- Gender: boundary shift much like the F0 shift (Strand & Johnson 1996)
- Age
- Vowel quality: boundary shift through differing visual phonetic information (Johnson 1999)
- Sociocultural: Speech intelligibility is reduced, when the voice is associated with an Asian
looking face (Rubin 1992)
Influencing parameters and
instruments for vowel normalization
Auditory gestalts - “secondary cues”
Duration
Formant frequency movement trajectories:
- Lehiste & Metzger 1973:
- Fixed duration vowels synthesized with steady-state formant frequencies (51% correct)
- mixed lists of the original vowels from men, women and children 79% correct.
Contents
1. Speech and speaker normalization in vowel normalization: definition
2. Influencing parameters and instruments for vowel normalization
3. Theories
3.1 Vocal tract normalization (VTN) 3.2 Talker normalization (TN)
4. Studies: Johnson 1990 and 1999 5. Recapitulation
Theories - VTN
Vocal tract normalization theories
consider that listeners perceptually evaluate vowels on a talker specific coordinate system.” (Johnson 2004)
• Context vowels (reference)
• Visual information about the size of the
Theories - VTN
But: Talkers may differ from each other at the level of their articulatory habits of speech:
“Perception may not be able to depend on vocal tract normalization to “remove” talker
differences by removing vocal tract differences.” (Johnson 2004)
Speaker/speech variation depends on anatomical differences only?
Theories - VTN
Cross-linguistic gender differences
Bladon, Henton and Pickering (1984):
The difference between men and women vary from language to language.
Cultural factors are involved in defining and shaping male or female speech
Theories - VTN
Fig. 3 Spectral shift
needed to normalize male and female spectra
From Bladon, Henton &
Pickering (1984)
Theories - VTN
“This seems to suggest that talkers choose
different styles of speaking as social, dialectal gender markers.
A speaker normalization that removes vocal tract differences will fail to account for the linguistic categorical similarity of vowels that are different due to different habits of
Theories - TN
Talker normalization is subject to expectations:
Magnuson & Nusbaum (1994) compared
1-voice with 2-voice instructions in a mixed-talker and blocked- talker experiment.
Advantage of blocked-talker disappeared when subjects didn’t know about the different F0s of the two voices.
Talker normalization is an active process:
Kato & Kakehi (1988) Listener adaptation to talker voice:
Increase in recognition accuracy over the course of 5 stimuli presented in noise
Theories - TN
“In this approach, cognitive categories are represented as collections of the stored cognitive representations of experienced instances of the category,
rather than as normalized abstract representations from which category-internal structure has been removed”
(Johnson 2004)
Contents
1. Speech and speaker normalization in vowel normalization: definition
2. Influencing parameters and instruments for vowel normalization
3. Theories
4. Studies
4.1 Johnson 1990 4.2 Johnson 1999
5. Recapitulation
Studies
“The role of perceived speaker identity in F0 normalization of vowels” (Johnson 1990)
Presentation of vowels from a “hood”-”hud”
continuum in two different intonational contexts which were judged to have been produced by different speakers, even
Studies
“The role of perceived speaker identity in F0 normalization of vowels” (Johnson 1990)
Shift in identification as a result of the intonational context
which was interpreted as evidence for the role of perceived speaker identity in vowel normalization
Studies
“Auditory-visual integration of talker gender in vowel perception” (Johnson 1999)
Exp. 1 found, that the gender of auditory-visually presented stimuli shift the phoneme boundary of a vowel continuum
Exp. 2 found that visual phonetic information is integrated in the boundary shift
Contents
1. Speech and speaker normalization in vowel normalization: definition
2. Influencing parameters and instruments for vowel normalization
3. Theories
4. Studies: Johnson 1990 and 1999
5. Recapitulation
Recapitulation
- Great internal and external influence on the perception (of vowels)
- Explanation must integrate repeated learning - Information on speaker identity influences the
perception (of vowels)
- But: Is the perception of speaker identity influenced by certain components of the
References
Bladon, R.A., Henton, C. G. & Pickering, J. B. (1984) Towards an auditory theory of speaker normalization. Language Communication 4, 59-69.
Fujisaki, H. & Kawashima, T. (1968) The roles of pitch and higher formants in the perception of vowels. IEEE Transactions on Audio and Electroacoustics AU-16, 73-77.
Hillenbrand, J. M. & Neary, T. M. (1999) Identification of synthesized /hVd/ utterances: Effects of formant contour. J.
Acoust. Soc. Am. 105, 3509-3523.
Ladefoged, P. & Broadbent, D. E. (1957) Information conveyed by vowels. J. Acoust. Soc. Am. 29, 98-104 Leather, J. (1983) Speaker normalization in the perception of lexical tone. Journal of Phonetics 11, 373-382
Lehiste, I. & Metzger, D. (1973) Vowel and speaker identification in natural and synthetic speech. Language and Speech 16, 356-364.
Johnson, K., Strand, E. A. & D’Imperio, M. (1999) Auditory-visual integration of talker gender in vowel perception. Journal of Phonetics 27, 359-384
Johnson, K. (2004) Speaker normalization in speech perception. Ohio State University
Johnson, K. (1990) The role of percieved speaker identity in F0 normalization of vowels. J. Acoust. Soc. Am. 88 642-654 Kato, K & Kakehi, K. (1988) Listener adaptability to individual speaker differences in monosyllabic speech perception. J.
Acoust. Soc. Of Japan 44, 180-186
Magnuson, J. & Nusbaum, H. (1994) Are representations used for talker identification available for talker normalization?
Proceedings of the International Conference on Spoken Language Processing.
Miller, R. L. (1953) Auditory tests with synthetic vowels. J. Acoust. Soc. Am. 25, 114-121.
Peterson, G. E. & Barney, H. L. (1952) Control methods used in the study of vowels. J. Acoust. Soc. Am. 24, 175-184 Rubin, D. L. (1992) Non-language factors affecting undergraduates’ jedgements of non-native English-speaking teaching
assistants. Research in Higher Education 33, 4.
Strand, E. A. & Johnson, K. (1996) Gradient and visual speaker normalization in the perception of fricatives. In Natural languag processing and speech technology: results of the 3rd KONVENS conference, Bielefeld, (D. Gibbon, Ed.),