• Keine Ergebnisse gefunden

Summary and general discussion

HRTF measurements

In the first section of this chapter, HRTF measurements from 11 subjects and one dummy head were presented. The HRTFs were sampled from a high number of source positions (5 resolution for individual and 1 for the dummy head) using the TASP system (see Chapter2) . The monaural and binaural localization cues of the HRTFs show the typical spatial dependencies that were found in the literature (e.g. (Mehrgardt and Mellert, 1977;

Shaw, 1974; Wightman and Kistler, 1989a; Møller et al., 1995)) and, hence, the mean HRTFs obtained here are in good agreement with results from Møller et al. (1995) and P¨0sselt et al. (1986).

The standard deviations of the monaural and binaural cues across subjects are highest for low elevations and decrease as the source elevation is raised. The standard deviation of the monaural spectral cues can be separated into two frequency regions. In the low frequency region, the standard deviation across listeners is small (typically below 2 dB).

In the high frequency region the standard deviation increases to up to 10 dB. The cross over frequency between both frequency bands is approx. 4 kHz for low elevations and increases to approx. 8 kHz for high elevations. Thus, HRTFs from low elevations

show more individual properties than HRTFs from high elevations. In the literature standard deviations across subjects are only shown for selected source positions mostly in the horizontal plane (Wightman and Kistler, 1989a) or for HRTFs collapsed over a wide range of spatial positions (Mølleret al., 1995). Although the general behavior of the standard deviation shown in the literature is consistent with the results of this study, the reduction of the standard deviation for elevated source positions has not been explicitly shown so far.

Dummy heads are intended to represent an average head of an individual subject. The HRTFs of the dummy head employed here (’Oldenburg dummy head’) show that the binaural cues are fairly within the range of individual cues for higher elevations. However, due to the lack of a torso and shoulder the binaural cues at low elevations strongly deviate from individual cues. The spread of the binaural cues across individuals clearly limit the use of dummy head HRTFs as a replacement for individual recordings. Since the deviations of the dummy head cues from the individual binaural cues vary considerable across source locations, the dummy head cues would only be suitable for some source positions. Furthermore, the deviations of the monaural cues provided by the dummy head from the ones originating from the subjects’ own ears are even larger than in the binaural case. It is known from the literature, that the spectral filtering of the pinna in the high frequencies is responsible for resolving front-back confusions and to estimate the source elevation (e.g. (Oldfield and Parker, 1984a; Oldfield and Parker, 1984b)).

However, the monaural dummy head cues strongly deviate from the individual cues especially at high frequencies. These differences depend on frequency and source position, and an extraction of a systematic pattern of these differences is difficult. Therefore, the results of this study suggest that the ’Oldenburg dummy head’ can not be used as an average head of a listener if spatially correct perceptual representations of virtual stimuli are required. If no possibility is given to measure HRTFs of an individual listener at least the HRTFs from a different listener should be employed because in this case at least the low frequency spectra are well matched. However, Wenzel et al. (1993) showed that localization performance is reduced by using non-individualized HRTFs. On possibility to overcome this problem is to scale the non-individual HRTF spectra in frequency to match the individual center frequencies of the peaks and notches (Middlebrooks, 1999a;

Middlebrooks, 1999b). The scale factor can be obtained by performing a psychoacoustic task that lasts about one hour (Middlebrooks et al., 2000)).

Spectral smoothing

In the second part of this investigation the effect of spectral smoothing on HRTFs was investigated. Spectral smoothing obviously reduces individual information in the high frequencies. If less than 16 cepstral smoothing coefficients are used, individual infor-mation in the high frequency region is reduced to level differences in relatively broad

frequency bands. However, the small peaks and notches code individual spatial infor-mation and should, therefore, possibly left unchanged in the smoothing process. This consideration is supported by an investigation of Kulkarni et al. and the results of Sec-tion 4. Both studies show that differences between original and smoothed HRTFs with regard to localization can be detected, if less than 16 cepstral smoothing coefficients are used.

This smoothing limit is also supported by the analysis of the ILD deviations as a function of smoothing. For 8 cepstral coefficients the broadband ILD deviation exceeds 1 dB. This suggest that the ILD deviation is detectable by subjects (Durlach and Colburn, 1979).

Furthermore, the results of this investigation show, that 1/N octave smoothing is not appropriate for smoothing HRTF spectra. The increasing amount of smoothing for high frequencies result in ILD deviations that are above the detection threshold even for 1/3 octave smoothing.

Smoothing does not only affect the ILD but also the ITD. However, it is shown in the present study that the ITD deviation is assumed to be perceptually irrelevant if the ITD that is incorporated to the minimum phase impulse responses is calculated from low pass filtered versions of the empirical HRIRs. Furthermore, the correction term that eliminates inherent ITDs of the minimum phase HRIRs has also to be calculated from low pass filter impulse responses. If, however, this correction term is computed from unfiltered impulse responses it can be assumed that the low frequency ITD of the min-imum phase plus frequency independent group delay HRTFs deviates from the ITD of the empirical HRTFs in a perceptually relevant range.

This result is consistent with findings of Kulkarni et al. (1999). In their study, the group delay of the minimum phase HRTFs and the minimum phase correction term were com-puted from unfiltered HRTFs. The results of a discrimination experiment showed, that subjects were able to distinguish minimum phase plus frequency independent group delay HRTFs from empirical HRTFs for sound incidence from the sides. On the basis of the considerations presented in the current study it can be supposed that the subjects would not have been able to detect the minimum phase plus frequency independent group delay stimuli if the incorporated ITD would have been correctly matched in the low frequency range.

In the last investigation presented in this study the length of the HRIRs as a function of smoothing was calculated. For both cepstral and 1/N octave smoothing the length of the impulse responses is substantially reduced. The length of the original impulse responses averaged across azimuth positions ranged from 4.5 ms to 6 ms depending on elevation.

In a study of Kulkarni and Colburn (1998) it was shown that 16 cepstral coefficients were sufficient for providing all spatially relevant information of the HRTF spectra. The results of this study show that for this amount of smoothing the length of the impulse responses is below 1.5 ms. From the investigation of the effect of smoothing on the ILD it can be concluded that 1/6 octave averaging produces perceptually irrelevant deviations

for logarithmic smoothing. The length of the corresponding impulse responses for this amount of smoothing ranges from 2-3 ms. Hence, by applying cepstral smoothing to the HRTF spectra the resulting impulse responses are shorter in comparison to 1/N octave smoothing. Therefore, both the effect of smoothing on the ILD (see above) and on the HRIR length lead to the same conclusion that linear smoothing is more appropriate for smoothing HRTFs than 1/N octave smoothing.