A likelihood approximation for locally stationary processes

(1)

A likelihood approximation for locally stationary processes

by Rainer Dahlhaus

Universitat Heidelberg

Abstract

A new approximation to the Gaussian likelihood of a multivariate locally stationary process is introduced. It is based on an approximation of the inverse of the covariance matrix of such processes. The new quasi-likelihood is a generalisation of the classical Whittle-likelihood for stationary processes. For parametric models asymptotic normality and eciency of the resulting estimator are proved. Since the likelihood has a special local structure it can be used for nonparametric inference as well. This is briey sketched for dierent estimates.

1 Introduction

Suppose we observe data X1::: XT from some nonstationary process and we want to t a parametric model to the data. An example is an autoregressive process with time varying coecients where we model the coecient functions by polynomials in time. If the process is Gaussian we can write down the exact likelihood function which, in the case of mean zero, takes the form

L

(e)T () := ^;1

T Gaussian log likelihood

= 12 log(2)+ 1

2T logdet + 12TX⁰^;¹X (1.1) with X = (X1::: XT)⁰ (the assumption of a zero mean is given up later on).

However, for most of the time varying models the calculations needed for the minimisation of this function are too time consuming. Suppose for example we want to t a time varying AR-model to the data where the coecient functions are polynomials in

0AMS 1991 subject classications. Primary 62M10 secondary 62F10.

0Key words and phrases. Locally stationary processes, Whittle-likelihood, local likelihood, preperiodogram, generalized Toeplitz matrices.

1

(2)

time and where the AR-model order and the polynomial orders have to be determined by a model selection criterion. Such a model can be written in state space form with time varying system matrices (cf. Dahlhaus, 1996b) and - in principle - the minimisation could be done as in the stationary case by using the prediction error decomposition, the Kalman lter and a numerical optimisation routine (cf. Harvey, 1989, Section 3.4). However, we usually have a high dimensional parameter space, time dependent system matrices and a large number of models at hand, which make the calculations practically impossible.

To overcome these problems we suggest in this paper an approximation to the above likelihood which is a generalisation of Whittle's approximation in the stationary case (cf.

Whittle, 1953, 1954). In the stationary case is the Toeplitz matrix of the spectral density. Whittle had suggested to approximate ^;¹ by the Toeplitz matrix of the inverse of the spectral density leading with the Szego formula (cf. Grenander and Szego, 1958, Section 5.2) to the Whittle likelihood

L

(W)T () := 14 ^Z^;

log4²f() + I^T() f()

d where

IT() = 12T^j

T

X

t=1Xtexp(^;it)^j² is the periodogram.

In this paper we derive a similar approximation for processes that only show locally some kind of stationary behaviour. More precisely we consider locally stationary processes as dened in Dahlhaus (1996a,b 1997), i.e. processes with a time varying spectral representation as in (2.1) (the exact denition is given in Section 3). For an introduction to univariate locally stationary processes we refer to Dahlhaus (1996c).

In Section 2 we motivate the approximation and discuss its benets in a simplied setting (univariate processes, mean zero). In Section 3 we introduce multivariate locally stationary processes and the generalisation of the Whittle likelihood for such processes.

We then investigate the properties of the resulting parameter estimate.

2

(3)

Technically the approximation is based on a special generalisation of Toeplitz matrices (see (2.2)). The behaviour of norms and matrix products of such matrices is investigated in the appendix.

2 A motivation for the likelihood approximation

In this section we use a simplied setting to introduce and motivate the likelihood approximation and to discuss its applications. Furthermore, we compare it to the Whittle approximation in the stationary case.

Suppose the observed process has a time varying spectral representation of the form XtT =^Z

;exp(it)A

t T

d() (t = 1::: T) (2.1)

where() is a stochastic process on ^;] with mean zero and orthonormal increments.

As e.g. in nonparametric regression the time parameter u = t=T in A is rescaled for a meaningful asymptotic theory (this is a special case of a locally stationary process as dened in Denition 3.1 below). We obtain for the variance covariance matrix

^rs = cov(XrTXsT)

= ^Z

;exp^fi(r^;s)^gA

r T

A

s T

d:

In the stationary case where A(_T^r) = A() does not depend on time this is equal to

Z

;exp^fi(r^;s)^gf()d

where f() = ^jA()^j² is the spectral density of the process. In the derivation of the Whittle approximation ^;¹ is approximated by the Toeplitz matrix

1 4²

Z

;exp^fi(r^;s)^gf()^;¹d

rs=1:::T :

In the nonstationary case we have forrs close to each other and for a function A which is smooth in time

^rs

Z

;exp(i(r^;s))f

r + s 2T

d 3

(4)

where f(u) :=^jA(u)^j² is the time-varying spectral density of the process.

This suggests to use now

( 1 4²

Z

;exp^fi(r^;s)^gf

r + s 2T

;1

d

)

rs=1:::T

as an approximation of ^;¹ in the nonstationary case. Since it leads to a slightly nicer criterion we use instead UT(₄¹²f^;¹) where

UT( ) =

Z

;exp^fi(r^;s)^g

1 T

r + s 2

d

rs=1:::T (2.2)

and x] denotes the smallest integer larger or equal to x. Note that UT(₄¹²f^;¹) is the classical Toeplitz/Whittle-approximation if f is constant over time (stationary case).

Using this approximation, i.e.

^;¹ UT

1 4²f^;¹

and a generalization of Szego's formula to the nonstationary case (see Proposition 3.4 below), nameley

T logdet1 1 2

Z 1 0

Z

;log2f(u)]ddu

we obtain the following likelihood function as an approximation of the exact Gaussian likelihood^L^(e)_T ()

L

(`)T () = 14 1 T

T

X

t=1

Z

;log 4²f

t T

d + 18²T X⁰UT(f^;¹)X:

The subsitution ^r+s₂ ] =t, r^;s = k yields X⁰UT(f^;¹)X = ^X^T

rs=1XrTXsT

Z

;exp^fi(r^;s)^gf

1 T

r + s 2

;1

d

= ^X^T

t=1

Z

;f( tT)^;¹^X_k X t+k=2]TX t^;k=2]Texp(ik)d 4

(5)

where the second sum is over all k such that 1 t + k=2]t^;k=2] T and x] is the largest integer smaller or equal to x. Thus,

L

(`)T () = 14 1 T

T

X

t=1

Z

;

nlog4²f( tT)+ ~IT(_T^t) f(_T^t)

od (2.3)

where

~IT(u) := 12₁ uT+k=2] uT^X^k^;k=2]TX uT+k=2]TX uT^;k=2]Texp(ik):

~IT(_T^t) may be regarded as a local version of the periodogram at timet. It was introduced by Neumann and von Sachs (1997) as a starting point for a wavelet estimate of the time- varying spectral density. We will call ~IT(_T^t) the preperiodogram at time t.

There exist several nice relations between the preperiodogram and the ordinary periodogram and the above likelihood and the Whittle-likelihood: We have

IT() = 12T^j

T

X

r=1XrTexp(^;ir)^j²

= 12 _k=;^T^X(T^;¹^;1)

T1

0

@

TX^;jk^j

t=1 XtTXt+^jk^jT

1

Aexp(ik) (2.4)

= 1T ^X_t=1^T 1 2

X

1 t+k=2] t^;kk=2]TX t+k=2]TX t^;k=2]Texp(ik)

= 1T ^X_t=1^T ~IT( tT)

i.e. the periodogram is the average of the preperiodogram over time. (2.4) means that the periodogram IT() is the Fourier transform of the covariance estimator of lag k over the whole segment while the preperiodogram ~IT(_T^t) just uses the pair X t+k=2]X t^;k=2] as a kind of \local estimator" of the covariance of lagk at timet (note that t+k=2]^;t^;k=2] = k). For this reason Neumann and von Sachs also called ~IT(_T^t) the localized periodogram.

A classical kernel estimator of the spectral density of a stationary process at some frequency0 therefore can be regarded as an average of the preperiodogram over all time

5

(6)

points and over the frequencies in the neighbourhood of 0. It is therefore plausible that averaging the preperiodogram around some frequency0 and around some time-pointt0

gives an estimate of the time-varying spectrumf(^t_T^o).

For a locally stationary process the preperiodogram is asymptotically unbiased. How- ever, its variance explodes as T tends to innity. Therefore, smoothing over time and frequency is essential to make a consistent estimate out of it. This smoothing is im- plicitely contained in the likelihood ^L^(`)_T (). Instead of using ~IT(_T^t) in (2.3) one could think of using the classical periodogram over some small segment of data around t. Such a likelihood was studied in Dahlhaus (1997). The preperiodogram has advantages over such an estimate since a classical periodogram always contains some implicit smoothing over time (even if it is calculated over a small segment) which in the case of time varying spectra means that some information is getting lost. For this reason the preperiodogram is a valuable raw estimate, e.g. in (2.3) or for wavelet smoothing as in Neumann and von Sachs (1997). Another advantage in the context of Whittle estimation is that no segment length (e.g. for a periodogram) has to be selected.

The above likelihood ^L^(`)_T () coincides with the Whittle likelihood in the stationary case: If a stationary model is tted, then f(u) = f() is constant over time and the likelihood becomes

L

(`)T () = 14^Z^;^flog4²f() + ^T¹

PT

t=1 ~IT(_T^t) f() ^gd

= ^L^(W)_T ():

For that reason the results on the asymptotic behaviour of the minimizer of ^L^(`)_T () contain most of the results on the classical Whittle estimate as a special case (apart from our restriction to Gaussian processes). Among the large number of papers we mention the results of Dzhaparidze (1971) and Hannan (1973) for univariate time series, Dunsmuir (1979) for multivariate time series and Hosoya and Taniguchi (1982) for misspecied multivariate time series which follow as a special case from Theorem 3.8 below. A general overview over Whittle-estimates for stationary models may be found in the monograph of

6

(7)

Dzhaparidze (1986). We also mention the results of Kluppelberg and Mikosch (1996) on Whittle estimates for linear processes where the innovations have heavy tailed distribu- tions, of Fox and Taqqu (1986) on Whittle estimates for long range dependent processes and of Robinson (1995) on semiparametric Whittle estimates for long range dependent processes. These results however are not a special case of Theorem 3.8.

There is another important aspect of the above likelihood approximation: The likelihood is of the form

L

(`)T () = 1T ^X_t=1^T `T( tT) with

`T( tT) = 1 4

Z

;^flog4²f( tT)+ ~IT(_T^t) f(_T^t)^gd

i.e. ^L^(`)_T () has a similar form as the negative log-likelihood function of iid observations where `T(_T^t) is the negative log-likelihood at time point t. In the present dependent situation`T(_T^t) may still be regarded as the negative log-likelihood at time pointt which now in addition contains the full information on the dependence (correlation) structure of XtT with all the other variables.

To illustrate this we give two examples:

1. Suppose we have the situation of nonparametric regression with heteroscedastic errors, i.e. our model is

XtT =m( tT)+(t

T )"^t "t iid ^N(01)

with m(u) = m(u), (u) = (u). This process is locally stationary in the sense of Denition 3.1 below. Since the mean is di erent from zero, the preperiodogram in

`(_T^t) contains an extra term (see (3.7) below). It is easy to show that in this case

`T( tT) = 1

2 log2²( tT)+ 1

2₂(_T^t)(X^tT ^;m( tT))² which is exactly the Gaussian log-likelihood.

7

(8)

2. Suppose

XtT =a( tT)X^t^;^1T +( tT)"^t "tiid ^N(01)

with a(u) = a(u), (u) = (u). Then XtT is locally stationary with time varying spectrum

f(u) = ²(u)

2 ^j1^;a(u)eⁱ^j^;² leading to

`T( tT) = 1

2 log 2²( tT)+ 1

2₂(_T^t)(X^tT ^;a( tT)X^t^;^1T)² +rt

with

rt =a( tT)²(X2tT ^;X2t^;1T) i.e. ^P^T_t=1rt=Op(1).

The fact that`T(_T^t) can be seen as the local likelihood of the process at timet opens the door for various nonparametric estimation methods. In this situation the model is parametrized by one or several curves in time (eg. as in Example 3.2).

Recall that several nonparametric estimation techniques can be written as the solution of a least squares problem, for example for the simple nonparametric regression problem

XtT =m( tT)+"^t a) a kernel estimate can be written as

m(u) = argmin^ _m 1 bTT

X

t K

u^;t=T bT

fXtT ^;m^g² where K is the kernel and bT is some bandwidth

8

(9)

b) a local polynomial t can be written as

^c(u) = argmin_c 1 bTT

X

t K

u^;t=T bT

(

XtT ^; d

X

j=0cj( tT ^;u)^j

)2

where c = (c0::: cd)⁰ are the coecients of the tted polynomial at time u c) an orthogonal series estimator (e.g. wavelets) can be written as

"

= argmin 1 T

X

t

(

XtT ^; J

X

j=1jj( tT)

)2

together with some shrinkage to obtain the nal estimator ^. Here the j() (j = 1::: J) denote some orthonormal functions. J usually increases with T.

Note that the ^f:::^g-brackets always contain the negative log likelihood of the param- eters up to some constants.

Suppose now we have a locally stationary model which is parametrized by one or several curves in time. By using the local likelihood we may dene completely analogous to above

a) a kernel estimate by

^(u) = argmin 1 bTT

T

X

t=1K

u^;t=T bT

`T

tT

b) a local polynomial t by

^c(u) = argmin_c 1 bTT

T

X

t=1K

u^;t=T bT

`T

Xd

j=0cj( tT ^;u)^j tT

!

c) an orthogonal series estimator (e.g. wavelets) by

"

= argmin 1 T

T

X

t=1 `T

XJ

j=1jj( tT) t T

!

together with some shrinkage of ".

9

(10)

In case of several parameter curves (a vector of curves) , the cj and the j are also vectors. In case of a multivariate process or a process with mean di erent from zero the denition (3.6) of `T(_T^t) has to be used.

It is obvious that the properties of these estimators have to be investigated in detail.

However, this is quite complicated and would exceed the scope of this paper. We only want to demonstrate that the likelihood representation may have important applications in nonparametric estimation as well.

In the next section we prove that ^L^(`)_T () indeed is a good approximation of the exact Gaussian likelihood^L^(e)_T (). Furthermore, we consider parametric models and prove that the resulting parameter estimates are consistent, asymptotically normal and ecient.

We do this for a larger class of processes than discussed in this section. In particular, we study multivariate locally stationary processes and allow the mean to be a function di erent from zero which introduces extra terms into the above expressions.

3 Asymptotic properties of parameter estimates

We start with the denition of a multivariate locally stationary process.

(3.1) Denition

A sequence of multivariate stochastic processes XtT = (X_tT⁽¹⁾:::X_tT^(d))⁰ (t = 1:::T) is called locally stationary with transfer function matrix A^o and mean function vector if there exists a representation

XtT =

t T

+

Z

;exp(it)A_otT()d() (3.1)

where

(i)() is a stochastic vector process on ^;] with a() = a(^;) and cum^fda¹(1):::da^k(k)^g=

^Xk j=1j

!

ha¹:::a^k(1:::k^;1)d1:::dk

10

(11)

where cum^f:::^gdenotes the cumulant of k^;th order, ha = 0, hab() = ab,

jha¹:::a^k(1:::k^;1)^j constk for all a1:::ak ² ^f1:::d^g and () =^P¹_j=^;1( + 2j) is the period 2 extension of the Dirac delta function.

(ii) There exists a constant K and a 2-periodic matrix valued function A : 01]^R^!

Cdd with A(u) = A(u^;) and

sup_t ^jAotT()ab ^;A

t T

ab^jKT^;¹ (3.2)

for allab = 1::: d and T ²^N. A(u) and (u) are assumed to be continuous in u.

f(u) := A(u)A(u)⁰ is the time varying spectral density matrix of the process.

Processes with an evolutionary spectral representation were introduced and investigated by Priestley (1965, 1981). The above denition is the multivariate generalization of the denition of univariate local stationarity as given in Dahlhaus (1997). This approach to local stationarity may be regarded as a setting which allows for a meaningful asymptotic theory for processes with an evolutionary spectral representation. The classical asymp- totics for stationary sequences is contained as a special case (if and A do not depend on t). A detailed discussion of this denition and a comparison to Priestley's approach can be found in Dahlhaus (1996c). Another denition of local stationarity has recently been given by Mallat, Papanicolaou and Zhang (1998). We remark that the methods presented in this paper do not depend on the special denition of local stationarity. In some sense the above denition is only a framework for investigating the asymptotic properties of the estimates.

Examples of locally stationary processes in the univariate case can be found in Dahlhaus (1996a). For the multivariate case we give the following examples.

(3.2) Examples

(i) Suppose Yt is a multivariate stationary process, () is a vector function and () is a matrix function. Then

XtT =( tT)+( t T )Y^t 11

(12)

is locally stationary. If Yt is an iid sequence we have the situation of multivariate nonparametric regression.

(ii) Suppose XtT is a time varying multivariate ARMA-model, that isXtT is denied by the di erence equations

p

X

j=0#j( tT)

Xt^;jT ^;(t^;j T )

=^X^q

j=0$j( tT)(t^;j T )"^t^;^j

where"tare iid with meanzero and variance-covariance matrixIdand #o(u) $o(u) Id. Under regularity conditions on the coecient functions #j(u) and $j(u) it can be shown similarly to the univariate case (Dahlhaus, 1996a, Theorem 2.3) that these di erence equations dene a locally stationary process of the form (3.1). The time varying spectral density of the process is

f(u) = 12#(u)^;¹$(u)(u)$(u^;)⁰#(u^;)⁰^;¹

where #(u) = ^P^p_j=0#j(u)e^ij and $(u) = ^P^q_j=0$j(u)e^ij. We omit details of the derivation. However, we remark that in this case the functions A_otT() and A(t=T) do not coincide. They only fulll (3.2).

In the following we look at parametric locally stationary models. An example is the case where the curves in the above examples are parametrized in time, e.g. by polynomials (for an example see Dahlhaus, 1997, Section 6).

Let X = (X_1T⁰ ::: X_TT⁰ )⁰, = ((_T¹)⁰::: (^T_T)⁰)⁰, and let the dT dT-matrices T(AB) and UT( ) be dened by

T(AB)rs=

Z

;exp(i(r^;s))AorT()BosT(^;)⁰d (3.3) and

UT( )rs =

Z

;exp(i(r^;s))

1 T

r + s 2

d (3.4)

12

(13)

(rs = 1:::T) where AorT(), BorT() and (u) are dd-matrices. Then the exact Gaussian likelihood is

L

(e)T () := d2 log(2)+ 1

2T logdet + 12T(X^;)⁰^;¹(X ^;) (3.5) where = T(AA) and E(X ^;)(X^;)⁰= T(AA) with A from Denition 3.1.

We now proceed as in the univariate case (Section 2) to nd a local likelihood approximation. We use a generalisation of the multivariate Szego identity (see Proposition 3.4 below) and UT(₄¹²f^;¹) as an approximation of ^;¹ to obtain

LT() := ^L^(`)T () := 141 T

T

X

t=1

Z

;log(2)^2ddetf( tT)]d + 18²T (X^;)⁰UT(f^;¹)(X^;)

= 1T ^X_t=1^T 1 4

Z

;

log

(2)^2ddetf( tT)

+ tr

f( tT)^;¹~I_T( tT) d

=: 1T ^X_t=1^T `T( tT) (3.6)

where

~I_T(u)ab:= 12 ₁ ^X^k

uT+k=2] uT^;k=2]T

X_uT+k=2]T^(a) ^;^(a)

uT + k=2]

T

X_uT^(b)^;_k=2]T ^;^(b)

uT^;k=2]

T

exp(^;ik) (3.7) is the multivariate version of the preperiodogram.

In the univariate case and for = 0 this is the likelihood we have already discussed in Section 2. We call`T(_T^t) the local likelihood at timet. If the mean is not zero and one is not interested in modelling the mean one may use ~I_T^{^}(u) instead of ~I_T(u) where ^ is the arithmetic mean or some kernel estimate (if the mean is not believed to be constant over time).

Before investigating the asymptotic properties of the minimizer of ^LT() we prove some results on the likelihood approximation itself. First we state two results which show

13

(14)

that UT(^f4²f^g^;1) and ^LT() are approximations of ^;¹ and ^L^(e)_T respectively. We also show that

L() := 14

Z 1 0

Z

;

logdetf(u) + trf(u)^;¹f(u)]ddu + 14

Z 1

0 ((u)^;(u))⁰f^;¹(u0)((u)^;(u))du (3.8) is the limit of ^LT() and ^L^(e)_T ().

The technical parts of the following proofs consist of the derivation of properties of products of matrices T(AB), T(AA)^;¹ and UT( ). These properties are derived in the appendix. In particular Lemma A.1, A.5 and A.8 are of relevance for the following proofs.

For convenience we refer in the following proposition to Assumption A.3 in the appendix concerning the smoothness of the transfer function and the mean. These conditions are fullled under Assumption 3.6 below. By^kA^kand A we denote the spectral norm and the Euclidean norm of a matrixA (cp. (A.1) and (A.2)). ^kv^k2 is the Euclidean norm of a vector.

(3.3) Proposition

Suppose the matrices A and fulll the smoothness conditions of Assumption A.3 (i) - (iii) (appendix) with existing and bounded derivatives _@u^@²²_@^@ A(u)ab

and eigenvalues of (u) which are bounded from below uniformly in u and . Then we

have 1

T ^T(AA)^;¹^;UT(^f4²A "A⁰^g^;¹) ² =O(T^;¹ln³T) (3.9)

and 1

T U^T( )^;¹ ^;UT(^f4² ^g^;¹) ² =O(T^;¹ln^;²³T):

Proof. Let T = T(AA) and UT = UT(^f4²A "A⁰^g^;¹). We obtain with Lemma A.1 (b,c) and Lemma A.5

T 1 ^;^T¹ ^;UT 2

T I1 ^;¹⁼²_T UT¹⁼²_T ²^k^;_T¹^k²

K(d^; 2

T tr^fUTT^g+ 1Ttr^fUTTUTT^g):

14

(15)

Lemma A.7 (i) now implies the result. The second result is obtained in the same way with Lemma A.8.

We now state the generalisation of the Szego identity (cf. Grenander and Szego, 1958, Section 5.2) to multivariate locally stationary process.

(3.4) Proposition

^Suppose A fullls Assumption A.3 (i), (ii), with bounded derivatives

@²

@u² @

@A(u)ab. Then we have with f(u) = A(u)A(u^;)⁰ T log det1 ^T(AA) = 12

Z 1 0

Z

;log(2)^ddetf(u)]ddu + O(T^;¹ln¹¹T):

If A = A depends on a parameter and fullls the smoothness conditions of Assumption 3.6 (iii), (iv), then the O(T^;¹ln¹¹T) term is uniform in .

Proof. The proof can be found in A.9 of the appendix.

From now on we set ^ri = _@^@ⁱ and ^r2ij = _@^@ⁱ_@² ^j.

(3.5) Theorem

Suppose XtT is a locally stationary Gaussian process with transfer function matrix A^o and mean function vector and we t a locally stationary model with transfer function matrix A_o and mean function vector . Suppose further that all eigenvalues of f(u) = A(u)A(u)⁰ are bounded from below uniformly in u and and the components of A, A, , are dierentiable with uniformly bounded derivatives

@²

@u² @

@A(u)ab, _@u^@²²_@^@ A(u)ab, _@u^@ (u)a, _@u^@ (u)a respectively. Then we have (i)

LT()^;^L^(e)_T () = OP(T^;¹ln¹¹T):

(ii) If in addition the rst derivatives ^rjA(u)ab and ^rj(u)a fulll the above smoothness properties, we also have

rj^LT()^;^rj^L(e)

T () = OP(T^;¹ln²³T):

15

(16)

(iii) Furthermore,

L() = lim_T

!1

E^L^(e)_T () = lim_T

!1

E^LT() and

L

(e)T ()^!^P ^L() ^LT() ^!^P ^L():

A similar result also holds for the higher order derivatives of the likelihoods. We conjecture that also a uniform result (in) holds and that the log-terms and the Gaussian assumption can be dropped. However, a uniform result requires muchmore e ort. In order not to blow up the paper we omit these generalisations.

Proof. (i) We obtain with Proposition 3.4 andBT := T(AA)^;¹^;UT(^f4²A "A⁰^g^;1)

LT()^;^L^(e)T () = 12T(X^;)⁰BT(X ^;) +O(T^;¹ln¹¹T):

Since

T (X1 ^;)BT(X ^;)

= 1T(X ^;)BT(X^;) (3.10)

+ 2T(X ^;)⁰BT(^;) + 1T(^;)⁰BT(^;) we obtain with Lemma A.8 and = T(AA)

E^fLT()^;^L^(e)_T ()^g = 12Ttr^fBT^g+ 12T(^;)⁰BT(^;) +O(T^;¹ln¹¹T)

= O(T^;¹ln¹¹T) and

var^fLT()^;^L^(e)_T ()^g = 12T²tr^fBTBT^g+ 1T²(^;)⁰BTBT(^;)

= O(T^;²ln²³T) 16

(17)

which implies the result.

(ii) We obtain with Lemma A.8,BT as above and CT :=^;T(AA)^;¹^fT(^rjAA)+

T(A^rjA)^gT(AA)^;¹^;UT(^rj^f4²A "A⁰^g^;¹)

rj^LT()^;^rj^L(e)

T () = 12T(X ^;)⁰CT(X ^;)^; 1

T (^r^j)⁰BT(X ^;) +O(T^;¹ln¹¹T):

Analogously to above we obtain with Lemma A.8 E(^rj^LT()^;^rj^L(e)

T ()) = O(T^;¹ln²³T) and

var(^rj^LT()^;^rj^L(e)

T ()) = O(T^;²ln⁴⁷T) which gives the result.

(iii) follows similarly to (i) (use e.g. BT = T(AA)^;¹ in the above derivation).

Theorem 3.5 (iii) basically gives the asymptotic Kullback-Leibler-information divergence of two multivariate locally stationary processes: If XtT( ~XtT) are multivariate locally stationary with spectral densities f = A "A⁰( ~f = ~A"~A⁰), mean functions (~) and Gaussian densities g(~g), then we obtain for the information divergence

D( ~f ~f) = lim_T

!1

T E1 ^glog g~g

= 14

Z 1 0

Z

;^flogdet ~f(u)f(u)^;¹] + tr ~f(u)^;¹f(u)^;I]^gddu + 14

Z 1

0 (~(u)^;(u))⁰~f(u0)^;¹(~(u)^;(u))du:

This is the time average of the Kullback-Leibler divergence in the stationary case (cf.

Parzen, 1983, for the univariate stationary case with mean zero).

We now study the behaviour of

^T := argmin

2 ^LT():

17

(18)

Furthermore, let

0 := argmin

2 ^L():

The results are proved under the following assumptions.

(3.6) Assumption

(i) We observe a realisation X1T:::XTT of a d-dimensional stationary Gaussian process with true mean function vector and transfer function matrixA^o and t a class of locally stationary Gaussian processes with mean function vector and transfer function matrixA_o, ²%^R^p, % compact.

(ii) 0= argmin^L() exists uniquely and lies in the interior of %.

(iii) The components ofA(u) are di erentiable in u and with uniformly continuous derivatives^r_{2ij @}_@u²²_@^@ A(u)ab.

(iv) All eigenvalues of f(u) = A(u)A(u^;)⁰ are bounded from below by some constant C > 0 uniformly in , u and .

(v) The components of A(u) are di erentiable in u and with uniformly bounded derivatives _@u^@ _@^@A(u)ab.

(vi) The components of (u), (u), ^ri(u) and ^r2ij(u) are di erentiable in u with uniformly bounded derivatives.

In the case where the model is correctly specied, i.e. A(u) = A(u) and (u) = (u) with some ²% one can show that 0 =.

(3.7) Theorem

Suppose that Assumption 3.6 holds. Then

^T P

!0:

Proof. The basic idea is taken from Walker (1964), Section 2. In Theorem 3.5 (iii) we have proved that

LT() ^!^P ^L():

18

(19)

Since 0 is assumed to be unique it follows that for all 1 ⁶= 0 there exists a constant c(1)> 0 with

Tlim^!1P(^LT(1)^;^LT(0)< c(1)) = 0:

Furthermore, we have with a mean value "

LT(2)^;^LT(1) = (2^;1)⁰^rLT(") where (cp. (3.10))

ri^LT() = 14 1 T

T

X

t=1

Z

;tr

f( tT)^rⁱf( tT)^;¹

d

+ 18²T (X^;)⁰UT(^rif^;¹)(X ^;) (3.11)

;

41²T (^rⁱ)⁰UT(f^;¹)(X ^;)

= 18²T (X ^;)⁰UT(^rif^;¹)(X ^;) + 14²T^rⁱ

n(^;)⁰UT(f^;¹)^o(X ^;) + const: (3.12) with a constant independent ofX (but dependent on and T). With the Cauchy-Schwarz inequality and Lemma A.1(h) we get

T (1 ^rⁱ)⁰UT(f^;¹)(X ^;)

T1

n(^ri)⁰UT(f^;¹)(^ri)(X ^;)⁰UT(f^;¹)(X ^;)^o¹⁼²

1

T^krⁱ^k22

1=22

T^kX^k²2+ 2T^k^k²2

1=2

kUT(f^;¹)^k which by Assumption 3.6 and Lemma A.5 is uniformly bounded by

K + K 1T^kX^k²2:

Similarly, we can estimate the other terms in (3.11) leading to 19

(20)

²²supU(¹)^jLT(2)^;^LT(1)^jK(1 + 1TX⁰X)

with some constant K. Since E_T¹X⁰X = _T¹^k^k₂₂ ^! ^P^T_a=1^R₀¹a(u)²du and var_T¹X⁰X =

T2²tr^f²^g _T²^k^k² ^K_T (Lemma A.1 and A.5)T^;¹X⁰X is bounded in probability. Thus there exists for all 1 ⁶=0 a c(1)> 0 and a = (1) with

Tlim^!1P( inf

2

2U(¹)^LT(2)^;^LT(0)c(1)=2)

1^;_Tlim

!1

P(^LT(1)^;^LT(0)< c(1))^;_Tlim

!1

P( sup

²²U(¹)^jLT(2)^;^LT(1)^jc(1)=2)

= 1:

A compactness argument as in Walker (1964) implies the result.

(3.8) Theorem

Suppose that Assumption 3.6 holds. Then we have

pT(^T ^;0)^!^D ^N(0;^;¹V ;^;¹) with

;ij = 14

Z 1 0

Z

;tr(f^;f⁰)^rijf^;⁰¹ddu^; 1 4

Z 1 0

Z

;tr(^rif⁰)(^rjf^;⁰¹)ddu + 14

Z 1

0 ^r2ij((u)^;⁰(u))⁰f^;⁰¹(u0)((u)^;⁰(u))du and

Vij = 14

Z 1 0

Z

;trf (^rif^;¹)f (^rjf^;¹)ddu + 12

Z 1 0

Z

;

ri((u)^;⁰(u))⁰f^;⁰¹(u0)f(u0)^rjf^;⁰¹(u0)((u)^;⁰(u))du:

Proof. We obtain with the mean value theorem

ri^LT(^T)^;^ri^LT(0) =^fr²^LT(⁽ⁱ⁾_T )(^T^;0)^gi

20

(21)

with^j_T⁽ⁱ⁾^;0^j ^j^T^;0^j(i = 1:::p). If ^T lies in the interior of % we have^rLT(^T) = 0.

If ^T lies on the boundary of %, then the assumption that 0 is in the interior implies

j^T^;⁰^j for some > 0, i.e. we obtain P(^pT^jrLT(^T)^j")P(^j^T^;⁰^j) ^!0 for all" > 0. Thus, the result follows if we prove

(i) ^r²^LT(_T⁽ⁱ⁾)^;^r²^LT(0)^!^P 0 (ii)^r²^LT(0)^!^P ;

(iii) ^pT^rLT(0)^!^D ^N(0V ):

We now obtain from (3.11)

r2ij^LT() = ^; 1 4 1

T

X

t=1

Z

;tr

f( tT)^r^2ijf( tT)^;¹

d (3.13)

;

41 1 T

T

X

t=1

Z

;tr

rif( tT)^r^jf( tT)^;¹

d + 18²T (X^;)⁰UT(^r2ijf^;¹)(X ^;)

+ 14²T (^rⁱ)⁰UT(f^;¹)(^rj)

;

41²T (^rⁱ)⁰UT(^rjf^;¹)(X ^;)

;

41²T (^r^j)⁰UT(^rif^;¹)(X ^;)

;

41²T (^r^2ij)⁰UT(f^;¹)(X ^;):

To prove (i) we have to consider the above terms separately. The assertion is obvious for the rst and second term. LetT =⁽ⁱ⁾_T . The remaining terms of (3.13) can all be written as sums of expressions of the form

T X1 ⁰UX 1T⁰UX or 1T⁰¹U2 (3.14) withU being equal toUT(f^;¹),UT(^rif^;¹) orUT(^r_2ijf^;¹). Lemma A.5(iii) implies^kU^T^;

U⁰^k^!0 in probability. Furthermore, _T¹^k^T ^;⁰^k22 ^!0 in probability. This implies for example with the Cauchy-Schwarz inequality

21