Parameters in the Voicing Domain - Acquisition of Quantitative Criteria

5.3 Acquisition of Quantitative Criteria

5.3.1 Parameters in the Voicing Domain

Voicing is a fundamental feature used in phonetic classification and refers to the vibration of the vocal folds during articulation. A sound produced with vibrating vocal cords is called voiced, for instance while those produced without such vibrations are called voiceless or unvoiced, for instance. Here the voicing information is extracted from the speech signal by, roughly speaking, searching for the quasi-periodically repeating glottal pulses as opposed to the non-periodical signals in voiceless stops or fricatives (cf. Hess 1983). When signal characteristics are good, there are usually no problems with this measurement. However, one has to mention that poor signal quality, background noise or other influences (creaky voice, laryngealizations, etc.) may affect the voicing detection. For instance, part of the speech signal might not be marked as voiced although there were actually

Chapter 5. Implementation 5.3 Acquisition of quantitative criteria vibrating vocal folds during its production, or a stretch of speech is marked as voiced although the original signal was not produced with vibrating vocal folds.

For the approach presented here voicing estimation is very important since it is bound to the estimation of F0,that is per definitionem F0 is only represented in voiced parts of speech signals. Since it is well known (cf. Laver 1994, p. 453) that F0 values are most often erroneous up to 5 cycles from the beginning or before the end of a voiced stretch this serves important information about potentially faulty or microprosodically affected F0 values. However, this does not imply that there cannot be a pitch accent at the beginning or at the end of a voiced part. When other acoustic features are present that support the presence of a pitch accent then it is likely for example that a F0 value at the end of a voiced phase becomes a L+H*

pitch accent candidate.

Moreover, it is important to know how long a voiced part in a speech signal is, or expressed in another way: it is important to look at the distance of a potential pitch accent candidate from the beginning and end of the voiced part it is located in.

Very short stretches of voicing (<5 cycles) are less reliable locations of pitch ac-cents than longer stretches as a result of the above mentioned errors at beginnings and ends of voiced parts. However, these short stretches are still possible locations of pitch accents since many short vowels fall into this time domain. Additionally, it is helpful to know the duration of the unvoiced stretches before and after a voiced phase. Since these are important features for the estimation of short term inter-ruptions of voicing or longer possibly pause-like breaks. Therefore, the following 16 parameters in the voicing domain are introduced (see table 5.1 on page 121).

All parameters are computed for every frame, which is virtually represented by the termt0. The decision for a±400 ms analysis window was based on the first fea-ture analysis (cf. section 4.1) and proved to be a reasonable analysis frame in order to cover enough contextual material for the selection of acoustic features and the subsequent mapping of tones.

Number of continuously voiced/voiceless values before or after pointt0:

Parameter Range Name

nr of continuously voiced values beforet₀ 0-40 VoicB nr of continuously voicel. values beforet₀ 0-40 VoilB nr of continuously voiced values aftert₀ 0-40 VoicA nr of continuously voicel. values aftert₀ 0-40 VoilA

These parameters calculate the number of uninterrupted voiced or voiceless values in time frames before and after pointt0. Since voiced values are represented with a F0 value larger than 0 all values are counted that are larger than 0 before or after

5.3 Acquisition of quantitative criteria Chapter 5. Implementation the value under inspection without a single interruption by an unvoiced value that is represented with a 0. The other way around the number of uninterrupted voice-less values before and after is calculated. Measurement values may range from 0-40 frames. The algorithm counts values beyond beginning and end of the file al-ways as “0” (= voiceless). Often speech files are cut off close to intonation phrase boundaries, which therefore increases the number of voiceless values towards the end of the file. However, this does not necessarily result in a boundary tone label at the end of each file, since also other parameters have to be present as well.

Number of voiced values before or after pointt0without continuation control:

Parameter Range Name

nr of voiced values 50 ms beforet₀ 0-5 Voic5B nr of voiced values 100 ms beforet₀ 0-10 Voic10B nr of voiced values 160 ms beforet₀ 0-16 Voic16B nr of voiced values 230 ms beforet₀ 0-23 Voic23B nr of voiced values 310 ms beforet₀ 0-31 Voic31B nr of voiced values 400 ms beforet₀ 0-40 Voic40B nr of voiced values 50 ms aftert₀ 0-5 Voil5A nr of voiced values 100 ms aftert₀ 0-10 Voil10A nr of voiced values 160 ms aftert₀ 0-16 Voil16A nr of voiced values 230 ms aftert₀ 0-23 Voil23A nr of voiced values 310 ms aftert₀ 0-31 Voil31A nr of voiced values 400 ms aftert₀ 0-40 Voil40A

These parameters calculate the number of voiced items in the interval represented by the number in it, that isVoic5B calculates the number of F0 values that are larger than 0 in an interval ranging from itemt0−1up to itemt0−5, that is 5 values before the value under inspection. In this connection it is important that there is no continuation control, so it is possible that voiced values are separated by short or long unvoiced stretches, but nevertheless they are counted when they are in the given interval. This is different to the measurement above, where only uninter-rupted stretches of either voiced or voiceless parts were counted. The sizes of the intervals were chosen because they showed a good coverage of the 400 ms intervals before and aftert0. These parameters give estimations for the change of voicing before and after the value under inspection.

The number of voiceless values within the intervals is derived by subtracting the number of voiced items from the length of the interval, for instance number of voiceless values 50 ms before=5−Voic5B.

Chapter 5. Implementation 5.3 Acquisition of quantitative criteria

Im Dokument Automatic Detection of Prosodic Cues (Seite 106-109)