• Keine Ergebnisse gefunden

2 The mathematical theory of discrete evolution- ary spectra

N/A
N/A
Protected

Academic year: 2022

Aktie "2 The mathematical theory of discrete evolution- ary spectra"

Copied!
26
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

to a theory of pitch perception

Andreas Thumfart Universitat Heidelberg

July 29, 1995

Abstract

A denition of discrete evolutionary spectra is given that complements the notion of evolutionary spectral density given by Dahlhaus in 1]. For processes that have a discrete evolutionary spectrum, the asymptotic behavior of linear functionals of the periodogram is investigated. The results are applied in a mathematical analysis of Licklider's theory of pitch perception. A pitch estimator based on this theory is investigated with respect to the shift of the pitch of the residue described by Schouten et al. in 8].

1 Introduction

In his paper 1] Dahlhaus introduces a new notion of alocally stationary process. His approach diers from the well known one given by Priesley (6, 7]) in being inherently asymptotic. This enables him to prove strong asymptotic results. Further, the spectral representation of a locally stationary process that Dahlhaus postulates is unique, in contrast to the one in Priestley's theory.

Dahlhaus denes a process to be locally stationary if it has a certain spectral representation. Every such process has an evolutionary spectral density. Hence his theory doesn't cover the case of a discrete spectrum. In section 2.1 of this paper we give a denition of a locally stationary process with discrete evolutionary spectrum and prove a uniqueness result for the spectral representation. In section 2.2 we discuss the asymptotic behavior of linear functionals of the periodogram of a process with discrete evolutionary spectrum. Finally we apply these results in section 3 to Licklider's theory of pitch perception (see 4]). We give a fast algorithm for a simplied version of his model and study its asymptotic behavior. A pitch estimator based on it is investigated with respect to the observations reported by Schouten et al. in 8].

1

(2)

2 The mathematical theory of discrete evolution- ary spectra

2.1 Denition and some elementary properties

We dene a process with discrete evolutionary spectrum as a process that can almost be written as a sum of pure oscillations. The amplitude, null-phase and frequency of every summand may change in time. But like in Dahlhaus' theory this change becomes slower and slower as the sample size increases. Here is the exact denition:

Denition 1

A sequence of stochastic processesXtT (t = 1:::T) is said to have a discrete evolutionary spectrum if

1. there exists a representation

XtT = X

n2MA0ntTa.s.

for someM

Z

and

2. for everyn 2M there exist a complex valued mean 0 stochastic processAn(u) on 01] with a.s. dierentiable paths and a sequence nT(t) (t = 1:::T) such that

9K8tTX

n

A0ntT ;An

t T

exp(inT(t))KT;1 a.s.

and 3.

9K8u X

n2Msupu jAn(u)jKa.s.

9K8u X

n2Msupu jA0n(u)jKa.s.

4.

8u201]n2M : EAn(u) = 0

8u1u2 201]n6=mnm2M :Cov(An(u1)Am(u2)) = 0 and

5. for every n 2M there exists a function n : 01]!

R

such that

9K8tTn2M nT(t);nT(t;1);n

t T

KT;1: n has a uniformly (in n2M) bounded derivative @@un(u).

2

(3)

n(u)is called aninstantaneous frequencyof XtT at timeu. We say thatXtT has a spectral lineof hight Var(An(u)) atn(u). When we deal with real valued processes, we always assume that 8n2 M : ;n 2M0 62M and 8u2 01] An(u) = A;n(u) and 8tTnT(t) =;;nT(t).

Example:

XtT = X

n2MAn

t T

expin

t T

t

i.e., nT(t) = n(t=T)t, where n is twice continuous dierentiable with bounded second derivative. Then we have n(u) = n(u);un0(u). Note that in general we cannot choose n(u) to be n(u).

Proposition 1

If XtT is a process with discrete evolutionary spectrum as above then Cov(Xt+TXtT) =X

n VarAn

t T

expin

t T

+R where jRjO1+T+2.

Proof

By 2 and 3 of deniton 1 we may freely exchange expectations and the sum in expressions of the form EPn2MA:::. Therefore XtT has a mean of order O(1=T).

EXtTXt+T = E X

nm2MA0ntTA0mt+T

= X

nm2MEA0ntTAm

t + T

exp(imT(t + )) + R1

= X

nm2MEAn

t T

Am

t + T

exp(i(mT(t + );nT(t))) +R1+R2

where jR1j and jR2j are of order O(1=T) a.s. by 2 of denition 1. Since An

and Am are uncorrelated for n6=m we get:

X

n2MEAn

t T

An

t + T

exp(i(nT(t + );nT(t))) +R1+R2

= X

n2MEAn

t T

2exp(i(nT(t + );nT(t))) +R1+R2+R3

3

(4)

R3 := X

n2ME

An

t T

An

t + T

;

An

t T

2

!

exp(i(nT(t + );nT(t)))

jR3j O T

X

n2MEsupu(jA0n(u)jjAn(u)j) by the mean value theorem.

The next step illustrates a technique that is central to the theory of discrete evolutionary spectra. To obtain the result we replacenT(t+);nT(t) by n(t=T).

The error we get is R4 := X

n2MEAn

t T

2expin

t T

exp

iX;1

k=0(nT(t + ;k);nT(t + ;k;1);n(t=T))

!

;1

!

Since

X;1

k=0(nT(t + ;k);nT(t + ;k;1);n(t=T))

isO(=T) + O(2=T) by 5 of denition 1 and the mean value theorem we have

jR4jO

+ 2 T

!

X

n2MEAn

t T

2: Therefore we dene

Denition 2

F(u ) := X

n2MVar(An(u))

1

n(u)]( ) is called spectral distribution function of XtT.

The spectral distribution function of a process with discrete evolutionary spectrum is uniquely determined by the covariance structure of the process:

Proposition 2

Under the assumptions of proposition 1:

Klim!1 1 2K + 1

K

X

=;KTlim

!1

CovXuT]+TXuT]T

exp(;i )

= X

n2MVar(An(u))

1

fn(u)g( ) (The convergence is not uniform in .)

4

(5)

Here x] denotes the greatest integer x.

Proof

2K + 11

K

X

=;KTlim

!1

CovXuT]+TXuT]T

exp(;i )

= X

n2MVar(An(u)) 12K + 1

K

X

=;Kexp(i(n(u); ))

by proposition 1. The last sum is the dirichlet kernel, which is an approximate identity.

2.2 Linear functionals of the periodogram

Leth :

R

!

R

be a data taper and dN(u ) :=NX;1

s=0 h s N

XuT];N=2+s+1T exp(;i s)

the tapered fourrier transform of a segment of length N around uT] of the time series. N is assumed to be even.

HkN( ) :=NX;1

s=0 h s N

k

exp(;i s) IN(u ) := 1

2H2N(0)dN(u )dN(u; ):

We investigate the asymptotic behavior of functionals of the form BN(u) :=Z

;IN(u )( )d where is a continuous 2-periodic function. Let

B(u) := X

n2M

jAn(u)j2(n(u)):

Note that in general B(u) is still random and 6= R;( )dF(u ). If only the phase of An(u) is random and the absolute value is deterministic, then B(u) =

R

;( )dF(u ).

Assumption A1:

1. XtT is a process with discrete evolutionary spectrum and

8u201]8n6=m 2M n(u)6=m(u) (1) 5

(6)

2. is a bounded, complex valued, continuous, 2-periodic function.

3. The data taper h :

R

!

R

is of bounded variation.

4. For the segment length N and the sample size T, (N2logN)=T ! 0 and T=N4 !0 hold as T !1.

Theorem 1

Under assumption A1 the following holds: If a.s. there exists aK <1 such that for all u201]

X

n6=m2M

jAn(u)jjAm(u)j

jn(u);m(u)j K (2)

then BN(u)!B(u)a.s. :

If there exists a K <1 such that for all u201]

X

n6=ml6=k2M

jE(An(u)Am(u)Al(u)Ak(u))j

jm(u);n(u)j K (3)

then BN(u)!B(u)

in quadratic mean and in probability. The convergence is uniform inu in both cases.

The rest of this section contains the proof of theorem 1 and some technical tools that are needed for it. Let

HN(f() ) :=NX;1

s=0 f(s)exp(;i s) LN() :=

( N jj1=N 1=jj 1=N jj

and letLN :

R

!

R

be the 2-periodic extension of LN. The following facts about LN are known form 1]:

Lemma 1

1.

9K8N Z

;LN(;)LN(;)dKLN(;)log N

2. If h is of bounded variation, then 9K such that 8Ns N and 8 we have

jHs( )jjHN( )jKLN( ).

The next lemma is easily proved by induction on N:

6

(7)

Lemma 2

HN

hN

g() = g(N ;1)HN( );

NX;1

s=0 (g(s);g(s;1))Hs( ):

Proof

of theorem 1. We rst write dN(u ) in a usefull form that makes it easy to prove the theorem. Using the representation 1 postulated in denition 1 we get

dN(u ) =NX;1

s=0 h s N

X

n2MA0nuT];N=2+s+1Texp(;i s) a.s. : First we replaceA0nuT];N=2+s+1T by

An

uT];N=2 + s + 1 T

!

exp(inT(uT];N=2 + s + 1)) and then An

uT];N=2+s+1 T

by An(u) to get dN(u ) = NX;1

s=0 h s N

X

n2MAn(u)exp(inT(uT];N=2 + s + 1))

exp(;i s) + R1+R2a.s. : (4)

For the error terms we have R1 := NX;1

s=0 h s N

X

n2M

A0nuT];N=2+s+1T; An

uT];N=2 + s + 1 T

!

exp(inT(uT];N=2 + s + 1))

!

exp(;i s)

jR1jO(N=T) a.s. by 2 of denition 1 and R2 := NX;1

s=0 h s N

X

n2M

An

uT];N=2 + s + 1 T

!

;An(u)

!

exp(inT(uT];N=2 + s + 1))exp(;i s)

jR2j O

N2 T

!

X

n2Msupu jA0n(u)jO

N2 T

!

a.s.

by the mean value theorem.

7

(8)

The following considerations are the only place in the proof, where techniques are used that are not already known from the case of processes with evolution- ary spectral density. In equation 4, we replace exp(inT(uT];N=2 + s + 1)) by exp(inT(uT];N=2 + 1))exp(in(u)s). Now we can (a.s. ) write dN(u ) as

X

n2MAn(u)exp(inT(uT];N=2 + 1))HN( ;n(u))

+R1+R2+R3: (5)

Here R3 := X

n2MAn(u)exp(inT(uT];N=2 + 1))R4(n) and

R4(n) := NX;1

s=0 h s N

exp(i(n(u); )s)

fexp(i(nT(uT];N=2 + s + 1);nT(uT];N=2 + 1)

;n(u)s));1g

= HN

hN

g() ;n(u) where

g(s) := expfi(nT(uT];N=2 + s + 1);nT(uT];N=2 + 1)

;n(u)s)g;1:

We want to use lemma 2 to nd an upper bound for jR4(n)j. Therefore we have to investigateg. g(0) = 0 and for s > 0 we have

g(s) = expiPsk;1=0 fnT(uT];N=2 + s + 1;k); nT(uT];N=2 + s;k);n(u)g)

;1:

By the mean value theorem there exists a nite K such that jg(s)j

KPsk;1=0f nT(uT];N=2 + s + 1;k);nT(uT];N=2 + s;k)

;n

uT];N=2+s+1;k T

+n

uT];N=2+s+1;k T

;n(u)o:

By 5 of denition 1 and the mean value theorem this is O(N=T) + O(N2=T).

Further, there exists aK such that

jg(s);g(s;1)j KjnT(uT];N=2 + s + 1);nT(uT];N=2 + s)

;n(u)j

ON T

:

8

(9)

HencejR4(n)jLN( ;n(u))O(N2=T) by lamma 2 and

jR3jLN( ;n(u))O(N2=T) X

n2M

jAn(u)j: Further for the main term ofdN(u ) we have

X

n2MAn(u)exp(inT(uT];N=2 + 1))HN( ;n(u))

O(1) X

n2M

jAn(u)jLN( ;n(u)):

Using the representation (5) ofdN(u ), we now turn to the proof of the theorem.

BN(u ) = X

n2MjAn(u)j2Z

;

jHN( ;n(u))j2

2H2N(0) ( )d

+R5 +R6+R7a.s. (6)

The leading error terms are

R5 := 1

2H2N(0)

Z

;R3 X

n2MAn(u)exp(inT(uT];N=2 + 1))

HN( ;n(u))( )d (7)

and

R6 := 1

2H2N(0)

Z

;

X

n6=m2MAn(u)Am(u)

exp(i(nT(uT];N=2 + 1);mT(uT];N=2 + 1))) HN( ;n(u))HN(m(u); )( )d :

The other error terms have been put into R7. They are of lower order or can be treated in the same way as R5 and R6.

jR5j O

N2 T

!

O1 N

X

nm2M

jAn(u)jjAm(u)j

Z

;LN( ;n(u))LN(m(u); )d

O

N logN T

!

X

nm2M

jAn(u)jjAm(u)jLN(m(u);n(u))

O

N2logN T

!

a.s.

9

(10)

jR6j O1 N

X

n6=m2M

jAn(u)jjAm(u)j

Z

;LN( ;n(u))LN(m(u); )d

O

logN N

!

X

n6=m2M

jAn(u)jjAm(u)j

jm(u);n(u)j:

Since jHN2H(;2Nn((0)u))j2 is an approximate identity, this proves the rst part of theorem 1. For the second part we use similar arguments to see that

Var(R6) O

(logN)2 N2

!

X

n6=ml6=k2M

E(An(u)Am(u)Al(u)Ak(u) LN(m(u);n(u))LN(l(u);k(u))

O

(logN)2 N

!

X

n6=ml6=k2M

E(An(u)Am(u)Al(u)Ak(u)

jm(u);n(u)j

Remarks:

1. Equation 1 of assumption A1 is restrictive and essential. It excludes e.g., that n(u) converges to m(u) (n 6= m) as say u ! 1=2 and n(u) = m(u) for u1=2. This example is also excluded by equations 2 and 3 in theorem 1. If we want to allow for such examples we have to reformulate those equations.

Equation 2 could be changed to

8u201]9K <1 X

(nm):n(u)6=m(u)

jAn(u)jjAm(u)j

jn(u);m(u)j Ka.s.

and equation 3 similarly. Then BN(u ) converges (a.s. or in quadratic mean respectively) to

B(u ) + X

n6=m:n(u)=m(u)

An(u)Am(u)(n(u))

Tlim!1exp(i(nT(uT];N=2 + 1);mT(uT];N=2 + 1))) provided this limit exists. Even if it exists it is not real in general. The convergence is no longer uniform inu. This shows that the interaction of very closely adjacent spectral lines can cause a lot of trouble.

2. Theorem 1 can be extended to the case of mixed evolutionary spectra. Assume

that XtT =XdtT +XctT

10

(11)

where XdtT has a discrete evolutionary spectrum Fd(u ) and XctT has evo- lutionary spectral density fc(u ). Then under A1 and the assumptions of theorem 1 on XdtT and assumption A.1 of 1] on XctT we have

BN(u )!Z

;fc(u )( )d + B(u ) in probability asT !1. The convergence is uniform in u.

3. The theory of discrete evolutionary spectra can be extended to allow for nitely many discontinuities in An(u) and n(u). Assume for simplicity that An(u) and n(u) have a single jump of nite hight at u = u0 for some n, where u0 is independent of n. Then BN(u ) still converges to B(u ) for u 6= u0. BN(u0 ) converges to

R

1=2

0 h2(v)dv

R

1

0 h2(v)dv

X

n2M

jAn(u0;)j2(n(u0;)) +

R

1

1=2h2(v)dv

R

1

0 h2(v)dv

X

n2M

jAn(u0+)j2(n(u0+))

a.s. or in quadratic mean, if a.s. there exists a K such that for every u

X

n6=m

jAn(u;)Am(u;)j

jm(u;);n(u;)j K

X

n6=m

jAn(u;)Am(u+)j

jm(u;);n(u+)j K and

X

n6=m

jAn(u+)Am(u+)j

jm(u+);n(u+)j K or if there exists aK such that for every u

X

n6=ml6=k

jE(An(u;)Am(u;)Al(u;)Ak(u;))j

jm(u;);n(u;)j K

X

n6=ml6=k

jE(An(u;)Am(u+)Al(u;)Ak(u+))j

jm(u+);n(u;)j K and

X

n6=ml6=k

jE(An(u+)Am(u+)Al(u+)Ak(u+))j

jm(u+);n(u+)j K respectively.

11

(12)

Remark 1 is immediate from the proof of theorem 1. The proof of remark 2 is more technical than that of theorem 1. In addition to the methods presented here, it uses techniques from the theory of evolutionary spectral densities. We omit it here.

The main idea in the proof of remark 3 is to (a.s. ) write dN(u ) as

X

n2MAn(u;)exp(inT(ut];N=2 + 1))HN=2( ;n(u;)) +

X

n2MAn(u+)exp(inT(ut];N=2 + 1))

HN( ;n(u+));HN=2( ;n(u+))+R

whereR is of reduced order. The details are technical and we omit them here.

3 Application to Licklider's theory of pitch per- ception

In 1951 Licklider proposed a theory of pitch perception (4]), that will be called correlogram in the sequel. Because of its high computational costs not many sounds could be analyzed at that time using this model. In the last years the interest in the correlogram grew again (s. e.g. 5, 10]) because the computational capabilities had increased drastically. Slaney and Lyon were able to compute it in real time for the rst time (10]).

Here we investigate a somewhat simplied version of this model on the basis of the theory of discrete evolutionary spectra. First Licklider's theory is described.

Then we discuss the asymptotics of the correlogram and present an algorithm, that computes this simplied version of the model much faster than the algorithm used by Slaney and Lyon (11, 10]). Finally a simple pitch estimator based on the correlgram is investigated. We analyze its asymptotic behavior and how it works on processes with discrete evolutionary spectra that are very similar to amplitude modulated sounds. We are especially interested in the eect of the shift of the pitch of the residue described by Schouten et al. in 8].

3.1 The correlgram

3.1.1 Informal description

When we hear a sound the soundwave has traveled through our outer ear to hit the eardrum. From there, the vibrations were transferred to the cochlea (or inner ear) by three small ossicles in the middle ear.

12

(13)

The inner ear is a bony snail-like structure. If we uncoil it, it becomes a long straight tube that is partitioned by the basiliar membrane, that extends almost the entire length of the cochlea. When the sound enters the inner ear, a traveling wave on the basiliar membrane is caused. The place, where this wave has its maximum amplitude depends on the frequency of the sound. Now the movementof the basiliar membranecauses the hair cells to release a chemicaltransmitter that generates nerve impulses in the auditory nerve. Because the movement of the basiliar membrane is dierent at dierent places depending on the frequency of the sound, dierent groups of hair cells are activated by dierent frequencies. The distribution of the energy of the sound among dierent frequencies is mapped to the distribution of haircell activities at dierent places in the cochlea.

These facts about hearing seem to be uncontroversial and more information may be found in textbooks such as 3]. Now Licklider's assertion is, that in the brain for every place in the cochlea or every group of hair cells an autocorrelation of the neural activity caused by that group is computed. This will become clearer as soon as we describe the correlgram mathematically.

3.1.2 The mathematical model

A model of the outer, middle and inner ear has been proposed in 9]. We use the linear part of it to dene a simplied correlogram. For details see 9].

The eect of the outer and middle ear on the soundwave are described by a linear lter. So the incoming sound is ltered rst.

Next, the mapping of the energy distribution among frequencies to the distribu- tions of basiliar membrane movement at dierent places of the cochlear is modelled by a lterbank. The cochlea is partitioned into 86 sections. For each sections there is a linear bandpass lter in the lterbank. The frequency responses of the individ- ual lters are rather broad and have one peek. They dier in the position of the peek and their bandwidth: The higher the frequency of the peek is, the broader is the bandwidth. The frequency responses overlap strongly. The output of the outer-middle-ear-lter is ltered in every lter of the lterbank separatly. So we get a vector of 86 time series.

While Slaney and Lyon model the strongly nonlinear eects of the haircells, we leave this step out.

Now, for every such time series, the (empirical) autocovariance function is com- puted.

Let (cpj)j(p = 1:::86) be the impulse response of the lter (of the lterbank) corresponding to sectionp of the cochlea convolved with the impulse response of the outer-middle-ear-lter. Further let (Xt)t=1:::T be the digitized input sound. Then

13

(14)

the p-th component of the output vector of the lterbank is Ypt :=X1

j=0cpjXt;j

and the correlgram can be written as KORT(pu) :=Z

;INYp(u )exp(i )d

where INYp(u ) is the tapered periodogram of a segment of Ypt. In fact, this is exactly what the algorithm given by Slaney and Lyon does, if we use it to compute our simplied correlgram. The incoming sound is ltered in the time domain (using the dierence equations that describe the lters), then the periodogram is computed and the result is subjected to an inverse fourrier transform. This algorithm takes a lot of computing time, since for every section of the cochlea, a periodogram and an inverse fourrier transform have to be computed.

If we could do the ltering in the frequency-domain, we would be much faster, since we would have to compute the periodogram and the inverse fourrier transform only once for everyu. But since we use a segmentwise periodogram and we cannot expect Ypt to be stationary, it is not clear that this will lead to the same result as the procedure given above. In the next subsection we will show, that in fact we can do the ltering in the frequency-domain if Xt = XtT has a discrete evolutionary spectrum.

3.2 Linear lters and discrete evolutionary spectra

LetXtT be a process with discrete evolutionary spectrumFX(u ) and (cj)j2

N

be

the impulse response of a linear lter. Assume that

1

X

j=0cjzj =k(z) = a(z)b(z)

wherea and b are polynomials with real coecients and b(z)6= 0 for every complex numberz such that jzj1.

Theorem 2

Then

YtT :=X1

j=0cjXt;jT

can be written as YtT = X

n2MAn

t T

exp(inT(t)) k (exp(;in(t=T))) + Ra.s.

14

(15)

where jRjOT1 a.s. . Hence YtT has spectral distribution function FY(u ) = jk (exp(;in(t=T)))j2FX(u ):

Proof

YtT =X1

j=0cj X

n2MA0nt;jT a.s.: Again we replace A0nt;jT by An

t;j T

exp(inT(t;j)) and An

t;j T

by An

t T

, making errors R1 and R2 with

jR1j O1 T

1

X

j=0jcjja.s.

jR2j O1 T

1

X

j=0jjcjja.s.

Now P1j=0jcj = @k@z(z) at z = 1 and hence converges absolutely. We have YtT = X

n2MAn

t T

exp(inT(t))

1

X

j=0cjexp(i(nT(t;j);nT(t))) + R1+R2a.s.:

Replacing nT(t;j);nT(t) by jn(t=T) we get the result, making an error R3 such that

jR3jO1 T

1

X

j=0j2jcjja.s.:

But 1

X

j=0j2cj = @k(z)@z + @2k(z) (@z)2 at z = 1 and therefore converges absolutely.

Remark:

An analogous result holds for processes with mixed evolutionary spec- tra. The results cease to hold, if the spectrum has discontinuities inu.

3.3 The asymptotic behavior of the correlgram and a fast algorithm

From theorem 1 we see that if the input-sound has an evolutionary spectrum, KORT(pu) :=Z

;IXN(u )jkp(exp(;i ))j2exp(i )d

where jkp(exp(;i ))j2 is the frequency response of lter (cpj)j converges to the same quantity as KORT(pu) does.

15

(16)

Denition 3

This quantity KOR(pu) := X

n2M

jAn(u)j2jkp(exp(;in(u)))j2exp(in(u)) is called theoretical linear correlogram.

We have:

Theorem 3

Under assumption A1 and the assumptions of theorem 1, KORT(pu) and KORT(pu) both converge to KOR(pu) a.s. or in quadratic mean respec- tively.

The denition of KORT(pu) gives us an algorithm that is much faster than the one proposed by Slaney and Lyon. But note, that they aim at computing a nonlinear correlogram that can't be computed with the algorithm presented here.

3.4 Visualizing sounds with correlograms

The correlogram and hence also its input-sound may be visualized as a movie. The timeu is represented by itself: KORT(pu) is shown at time uT]. For a xed u KORT(pu) is presented as a two-dimensional picture. For every and p we have one pixel. is plotted on the horizontal and p on the verticalaxis. If KORT(pu) <

0 the pixel (p) is red, else it is grey.1 The bigger jKORT(pu)j is, the darker is the pixel.

It turned out, that for most sounds a few cochlea-sections are so predominant, that the biggest part of the picture is white. Therefore the information contained in the correlogram is conveyed much better, if it is rescaled. We do the rescaling in exactly the same way as Slaney and Lyon:

KORRT(pu) := KORT(pu) KORT(0pu)0:75

The scaling with KORT(0pu)0:75 seems to be ad hoc. 0.75 is the exponent that made the correlogram look best. Note that we could not have used KORT(0pu) because then the dierences between the cochlea-sections had been lost.

In 10] many correlgrams of interesting sounds are shown. Figure 1 presents a (rescaled) correlogram of the phoneme /A/2 computed with the algorithm given in 11]. Since this is a stationary sound, all pictures look equal.

Figure 2 presents one computed with our algorithm:

1If you have not printed this paper on a color printer, you see the absolute value of the cor- relogram in gure 2. This paper is available as postscript le with color via anonymous ftp from statlab.uni-heidelberg.de.

2The transcription is according to the ARPAbet. See 2].

16

(17)

Figure 1: A correlogram of the phoneme /A/, computed with the algorithm given by Slaney and Lyon.

Figure 2: A correlogram of the phoneme /A/, computed with our algorithm. Note that if you don't have color, you only see a representation of the absolute value of the correlgram.

Dierences along the vertical axis show us dierences in the activity of dierent cochlea-sections and hence in the magnitude of energy in dierent frequency bands.

Since the lters of the cochlea lterbank are tuned broadly, this gives us information about a strongly smoothed version (or the envelope) of the spectrum of the input- sound. Dark horizontal bands in the correlgram therefore indicate frequency bands with strong energy. In the context of speech-analysis they are often calledformants. In contrast, the correlation plotted on the horizontal axis reacts to the ne- structure of the spectrum of the input-sound. Assume e.g., the sound is a real valued locally stationary process, that has a discrete spectrum with lines at say1;1 and integer multiples. Then the correlation will be big at the lag corresponding to 1, indicating the fundamental frequency of the sound. Therefore, dark vertical bands in the correlogram show pitch-information.

3.5 Pitch estimation

The last remark indicates that we can try to estimate the pitch of a sound by summing up the correlogram along the vertical axis (i.e., along the cochlea-sections) and looking for the maximum.

17

(18)

Denition 4

SUMKORT(u) := X86

p=1KORT(pu) is called empirical summary correlgram.

PITCHPERIODT(u) :=argmax12SUMKORT(u) where 1 and 2 constitute some reasonable bound for the pitch period.

SUMKOR(u) := X86

p=1KOR(pu) is called theoretical summary correlgram.

PITCHPERIOD(u) :=argmax12SUMKOR(u):

Proposition 3

If the assumptions of theorem 1 and A1 hold, SUMKORT(u)con- verges to SUMKOR(u)and PITCHPERIODT(u)to PITCHPERIOD(u)a.s. or in probability respectively.

The proof of the rst part of the proposition is trivial. The second part may be proved by arguments that are well known from consistency proofs for minimum distance estimators. An example of such a proof may be found in 1]. We therefore omit it.

Much more interesting than these theoretical results is the question, how good the pitch estimator describes real pitch perception by humans. A lot of psychopysical data about pitch perception is known. We want to test our pitch estimator against the observations about the pitch of the residue described by Schouten et al. in 8].

Schouten et al. presented amplitude modulated signals of the form s(x) = 0:5msin(2(f ;g)x) + sin(2fx) + 0:5msin(2(f + g)x)

to their listeners who judged the pitch of the signal by adjusting a matching signal.

This matching signal was of the form

0:5msin(2(n;1)g0x) + sin(2ng0x) + 0:5msin(2(n;1)g0x) (8) for some integern, where was the parameter that could be adjusted by the listeners.

The sound of interest was said to have pitchg0 for a subject, if the subject judged this sound to have the same pitch as the matching sound with parameter. Thus the pitch was given as a frequency in Hz. See 8] for more details. Schouten et al. used the values m = 0:9 and g = g0 = 200Hz and started with a value of f = f0 = ng0

18

(19)

wheren is a natual number, typically n = 10. Then they shifted f up and down in steps of 50 Hz. The result was, that for f = f0 the pitch was g0. As f was shifted, the pitch changed linearly as long as f was close enough to f0 i.e., jf;f0j< g0. A rst approximation is

P = g0 + f gf00 (forjfj< g0)

whereP is the pitch (in Hz) and f = f ;f0. This is called therst eect of pitch shift. If one looks closer, one sees that the slope of the pitch as function of f is actually steeper. It can better be described as

P = g0+ f g0(1 +b)

f0 (for jfj< g0)

where b depends on the indiviual subject that listens.3 A typical value is b = 0:35.

This result is called the second eect of pitch shift.

Nows(x) is a deterministicsignal and not a locally stationary process. Therefore we use a somewhat dierent but similar signal. Let

s1(x) = 0:5mcos(2(f;g)x + '0l) + cos(2fx + '0c) + 0:5mcos(2(f + g)x + '0r) where '0l'0c'0r are independent identically distributed random phases. If s is dig- itized at a sampling rate we may view it as a process with discrete evolutionary spectrum.

fs1(tT) = Al

t T

exp(;il(t)) + Al

t T

exp(il(t)) + Ac

t T

exp(;ic(t)) + Ac

t T

exp(ic(t)) + Ar

t T

exp(;ir(t)) + Ar

t T

exp(ir(t))

where l(u) = 2(f ;g)=

c(u) = 2f=

r(u) = 2(f + g)=

l(t) = tl

c(t) = tc

r(t) = tr

Al(u) = 0:5exp(i'l) Ac(u) = 0:225exp(i'c) Ar(u) = 0:225exp(i'r):

3Here we do not consider a change ofg as Schouten et al. did.

19

(20)

'l'c'r are independent identically distributed according to the uniform distribu- tion on ;]. We use these processes with the values 1950, 2000, 2050, 2100, 2150, 2200, 2250 and 2300 Hz forf and 200 Hz for g.

In fact, the dierence between these signals and those used by Schouten et al. is not signicant: It is theoretically insignicant, because we can develop an asymp- totic theory for almost periodic deterministic signals that is completely analogous to the theory of locally stationary processes with discrete spectra. Just let A and A0 be deterministic, leave out 4 and replace almost sure convergence by normal con- vergence in denition 1. Then we can prove analogous results and the theoretical summary correlogram forfs1(tT) is the same as for the analogous deterministic sig- nal with'l='c ='r = 0. The dierence also seems to be practically insignicant, since for both signals the pitch estimates are exactly the same.

In addition we present a signal fs2(tT), where the center frequency f is changed continuously from 1950 to 2200 Hz. Here

l(u) = 2(f ;g + u250Hz)=

c(u) = 2(f + u250Hz)=

r(u) = 2(f + g + u250Hz)=

l(t) = 2t(f;g +2tT250Hz)=

c(t) = 2t(f + 2tT250Hz)=

r(t) = 2t(f + g +2tT250Hz)=

Al(u) = 0:5exp(i'l) Ac(u) = 0:225exp(i'c) Ar(u) = 0:225exp(i'r):

We analyze the sounds with the pitch estimator described above and with an even simpler one, that is just the argmax of the estimated covariance function of the signal:

PITCHPERIOD2T(u) := argmax12

Z

;IN(u )exp(i )d :

It turns out that both estimators PITCHPERIODT and PITCHPERIOD2T produce exactly the same result. Obviously the inuence of the cochlear lters on pitch estimation is small in our examples.

The results for sf1 are shown in table 1 and gure 3. In gure 3 the estimated pitches are shown as dots. The upper line is the line described by the rst eect of pitch shift forf0 = 2000Hz, the lower line for f0 = 2200Hz. We have translated the pitch periods (lag ) into frequncies in Hz (P) by the formula P = =, where is the sampling rate of the signal. Hence P is the frequency of the pure oscillation (discretized with sampling rate) with period . A theoretical justication for this formula will be given below (p. 23). Further, for the relavant values ofP, the pitch

20

(21)

f in Hz pitch in Hz 1950 195.122 2000 200.000 2050 205.128 2100 210.526 2150 195.122 2200 200.000 2250 205.128 2300 207.792

Table 1: Estimated pitches of fs1.

Figure 3: The estimated pitch of fs1 vs. the center frequency plotted as dots. The lines show the rst eect of pitch shift forf0 = 2000Hz (upper line) andf0 = 2200Hz (lower line).

21

Referenzen

ÄHNLICHE DOKUMENTE

Abstract : In this paper, infinitely repeated prisoner's dilemma game as a benchmark being used to build a new model as the payoff matrix of an evolutionary game dynamics, with

[r]

Abstract: The spectrum of the Laplacian has been extensively studied on Riemann- ian manifolds, and particularly Riemannian locally symmetric spaces.. Toshiyuki Kobayashi and I

Nastávajú dve možné situácie -po prvé, že firma zavedie inováciu v čase objavu (okamžitá rea- lizácia) alebo -po druhé, že firma odloží zavedenie (odložená

Conformal equivalence is introduced for edge-constraint nets using a discrete analog of spin transformations, which is then used to construct discrete Bonnet pairs, two

Daniel Tubbenhauer 2-representation theory in a nutshell October 2018 13 / 15.. weak 2-Jordan–H¨ older filtration), but their decategorifications are transitive N 0 -modules and

In this paper, we formulate a model of reference-dependence based on the marginal rates of substitution at a reference point of a reference-free utility function, defined over travel

More precisely, we consider an operator family (A(ρ)) ρ∈X of closed densely defined operators on a Banach space E, where X is a locally compact