Remarks on Low-Dimensional Projections of High-Dimensional Distributions

(1)

Remarks on Low-Dimensional Projections of High-Dimensional Distributions

Lutz Dumbgen and Perla Zerial December 6, 1996

Abstract.

Let P = P⁽^q⁾ be a probability distribution on q-dimensional space.

Necessary and sucient conditions are derived under which a randomd-dimensional projection of P converges weakly to a xed distribution Q on

R

^d as q tends to innity, whiled is an arbitrary xed number. This complements a well-known result of Diaconis and Freedman (1984). Further we investigated-dimenional projections ofP, where^b P is the empirical distribution of a random sample from P of size n. We^b prove a conditional Central Limit Theorem for random projections of n¹⁼²(P^b ^;P) given the data P, as q and n tend to innity.^b

Correspondence to: Lutz Dumbgen, Institut fur Angewandte Mathematik, Univer- sitat Heidelberg, Im Neuenheimer Feld 294, D-69120 Heidelberg, Germany

lutz@statlab.uni-heidelberg.de

Research supported in part by European Union Human Capital and Mobility Program ERB CHRX-CT 940693.

1

(2)

1 Introduction

A standard method of exploring high-dimensional datasets is to examine various low-dimensional projections thereof. In fact, many statistical procedures are based explicitly or implicitly on a \projection pursuit", cf. Huber (1985). Diaconis and Freedman (1984) showed that under weak regularity conditions on a distribution P = P⁽^q⁾ on

R

^q, \most" d-dimensional orthonormal projections of P are similar (in the weak topology) to a mixture of centered, spherically symmetric Gaussian distribution on

R

^d if q tends to innity while d is xed. A graphical demonstration of this disconcerting phenomenon is given by Buja et al. (1996). It should be pointed out that it is not a simple consequence of Poincare's (1912) Lemma, although the latter is at the heart of the proof. The present paper provides further insight into this phenomenon. We extend Diaconis and Freedman's (1984) results in two directions.

Section 2 gives necessary and sucient conditions on the sequence (P⁽^q⁾)q d such that \most" d-dimensional projections of P are similar to some distribution Q on

R

^d. It turns out that these conditions are essentially the conditions of Diaconis and Freedman (1984). The novelty here is necessity. The limit distribution Q is automatically a mixture of centered, spherically symmetric Gaussian distributions.

The family of such measures arises in Eaton (1981) in another, related context.

More precisely, let ; = ;⁽^q⁾ be uniformly distributed on the set of column-wise orthonormal matrices in

R

^q^d (cf. Section 4.2). Dening

^>P := ^LXP(^>X)

for ²

R

^d^q, we investigate under what conditions the random distribution ;^>P converges weakly in probability to an arbitrary xed distributionQ as q^!¹, while d is xed.

Section 3 studies the dierence between P and the empirical distribution P =^b Pb⁽^qn⁾ ofn independent random vectors with distribution P. Suppose that (P⁽^q⁾)q d

2

(3)

satises the conditions of Section 2 and ; is independent from P. Then, as n and^b q tend to innity, the standardized empirical measure n¹⁼²(;^>P^b ^;;^>P) satises a conditional Central Limit Theorem given the data P.^b

Proofs are deferred to Section 4. The main ingredients are Poincare's (1912) Lemma and a modication of a method invented by Hoeding (1952) in order to prove weak convergence of conditional distributions, which is of independent interest.

Further we utilize some results from the theory of empirical processes.

2 The Diaconis-Freedman Eect

Let us rst settle on some terminology. A random distribution Q on a separable^b metric space (

M

) is a mapping from some probability space into the set of Borel probability measures on

M

such that ^R f dQ is measurable for any function f^b ²

Cb(

M

), the space of bounded, continuous functions on

R

^d. We say that a sequence (Q^bk)k of random distributions on

M

converges weakly in probability to some xed distributionQ if for each f ²^Cb(

M

),

Z f dQ^bk ^!^p

Z f dQ as k ^!¹:

In symbols, Q^bk ^!^w^p Q as k ^! ¹. We say that the sequence (Q^bk)k converges weakly in distribution to a random distributionQ on^b

M

if for each f ²^Cb(

M

),

Z f dQ^bk ^!^L

Z f dQ as k^b ^!¹:

In symbols,Q^bk ^!^w^LQ as k^b ^!¹. Standard arguments show that (Q^bk)k converges in probability toQ if, and only if,

fsup^2F^bL

Z f dQ^bk^;

Z f dQ ^!^p 0 (k ^!¹)

where^FbLstands for the class of functionsf :

M

^!^;11] such that^jf(x)^;f(y)^j (xy) for xy²

M

.

Now we can state the rst result.

3

(4)

Theorem 2.1

The following two assertions on the sequence (P⁽^q⁾)q d are equivalent:

(A1)

There exists a probability measure Q on

R

^d such that

;^>P ^!^w^p Q as q ^!¹:

(A2)

If X = X⁽^q⁾X =^f X^f⁽^q⁾ are independent random vectors with distribution P, then

L(q^;1^kX^k²) ^!^w R and q^;1X^>X^f ^!^p 0 as q^!¹ for some probability measureR on 0¹.

(Throughout, ^kx^k denotes Euclidean norm (x^>x)¹⁼².) The limit distribution Q is equal to the normal mixture

Z

Nd(0²I)R(d²):

Corollary 2.2

The random probability measure ;^>P converges weakly to the standard Gaussian distribution^Nd(0I) in probability if, and only if, the following condition is satised:

(B)

For independent random vectorsX = X⁽^q⁾X =^f X^f⁽^q⁾ with distribution P, q^;1^kX^k² ^!^p 1 and q^;1X^>fX ^!^p 0 as q^!¹: ² The implication \(A2) =⁾ (A1)" in Theorem 2.1 as well as suciency of condition (B) in Corollary 2.2 are due to Diaconis and Freedman (1984, Theorem 1.1 and Proposition 4.2).

Example 2.3

Conditions (A1-2) are not very restrictive requirements. For instance, suppose that P = ^L(k +kZk)¹kq

, where (Zk)k ¹ is a sequence of independent, identically distributed random variables with mean zero and variance

4

(5)

one, and = ⁽^q⁾ ²

R

^q, = ⁽^q⁾ ²0¹^q. Then conditions (A1-2) are satised if, and only if,

(

A3

) q^;1^k^k² ^! 0 q^;1^k^k² ^! r 0 and q^;1 max

1kq k² ^! 0 as q ^!¹, whereR = r.

3 Empirical Distributions

In some sense Theorem 2.1 is a negative, though mathematically elegant result. It warns us against hasty conclusions about high-dimensional data sets after examin- ing a couple of low-dimensional projections. In particular, one should not believe in multivariate normality only because several projections of the data \look normal". On the other hand, even small dierences between dierent low-dimensional projections of P may be intriguing. Therefore in the present section we study the^b relationship between projections of the empirical distribution P and corresponding^b projections of P.

In particular, we are interested in the halfspace norm

k;^>P^b ^;;^>P^k^KS := sup

closed halfspaces H^R^d ^j;^>P(H)^b ^;;^>P(H)^j

of ;^>P^b ^; ;^>P. In case of d = 1 this is the usual Kolmogorov-Smirnov norm of

;^>bP^;;^>P. In what follows we use several well-known results from empirical process theory. Instead of citing original papers in various places we simply refer to the excellent treatises of Pollard (1984) and van der Vaart and Wellner (1996). It is known that

IE sup

^2R^q ^d ^k^>P^b ^;^>P^k^KS C(q=n)¹⁼² (3.1)

for some universal constant C. For the latter supremum is just the halfspace norm of P^b ^;P, and generally the set of closed halfspaces in

R

^k is a Vapnik-Cervonenkis

5

(6)

class with Vapnik-Cervonenkis index k + 1. Inequality (3.1) does not capture the typical deviation betweend-dimensional projections of P and P. In fact,^b

^2Rsup^q ^d IE^k^>bP ^;^>P^k^KS C(d=n)¹⁼²: This implies that

IE^k;^>P^b ^;;^>P^k^KS C(d=n)¹⁼² (3.2)

where the random projector ; and P are always assumed to be stochastically in-^b dependent. The subsequent results imply precise information about the conditional distribution of n¹⁼²^k;^>P^b^;;^>P^k^KS given the data P. This point of view is natural^b in connection with exploratory projection pursuit. It turns out that under condition (B) of Corollary 2.2, this conditional distribution converges weakly in probability to a xed distribution. Under the weaker conditions (A1-2) of Theorem 2.1 it converges weakly in distribution to a specic random distribution on the real line.

More generally, let^H be a countable class of measurable functions from

R

^d into ^;11]. Any nite signed measure on

R

^d denes an element h ^7! (h) := ^R hd of the space `¹(^H) of all bounded functions on ^H equipped with supremum norm

kz^k^H:= suph^2H^jz(h)^j. We shall impose the following condition on the class ^H and some distributionQ on

R

^d.

(C1)

There exists a countable subset ^Ho of ^H auch that each h²^H can be repre- sented as pointwise limit of some sequence in^Ho.

(C2)

The set^H satises the uniform entropy condition

Z

1

0

qlog(N(u^H)du < ¹:

Here N(u^H) is the supremum of N(u^HQ) over all probability measures^e Q on^e

R

^d, andN(u^HQ) is the smallest number m such that^e ^H can be covered withm balls having radius u with respect to the pseudodistance

_Q^e(gh) := ^qQ((g^;h)²):

6

(7)

(C3)

For any sequence (Qk)k of probability measures converging weakly to Q,

kQk^;Q^k^H ^! 0 ask ^!¹:

An example for conditions (C1-3) is the set^Hof (indicators of) closed halfspaces in

R

^d and any distribution Q on

R

^d such that Q(E) = 0 for any hyperplane E in

R

^d. Here condition (C3) is a consequence of Billingsley and Topsoe's (1967) results.

Condition (C1) ensures that random elements such as^k;^>P^b^;;^>P^k^Hare measurable. A particular consequence of (C2) is existence of a centered Gaussian process BQ having uniformly continuous sample paths with respect toQ and covariances

IEBQ(g)BQ(h) = Q(gh)^;Q(g)Q(h):

This is proved via a Chaining argument. In the subsequent theorem we consider a decomposition of BQ as a sum BQ¹ +BQ² of two independent centered Gaussian processes on^H. With the help of Anderson's (1955) Lemma or further application of Chaining one can show thatBQ¹andBQ² admit versions with uniformly continuous sample paths.

Theorem 3.1

Suppose that the sequence (P⁽^q⁾)q d satises conditions (A1-2) of Theorem 2.1, and suppose that conditions (C1-3) are satised with Q being the corresponding limit measure^R ^Nd(0²I)R(d²). Dene

B⁽^qn⁾ := n¹⁼²(;^>P^b^;;^>P)(h)_h2H

and let F be a continuous functional on `¹(^H) such that F(B⁽^qn⁾) is measurable for allq d and n1. Then, as n and q tend to innity,

L

F(B⁽^qn⁾)P^b ^!^w^L ^L

F(BQ¹+BQ²)BQ²

where BQ¹ and BQ² are two independent centered Gaussian processes having uniformly continuous sample paths with respect toQ and covariances

IEBQ¹(g)BQ¹(h) = Q(gh)^;^Z ^Nd(0²I)(g)^Nd(0²I)(h)R(d²) 7

(8)

= ^Z ^Nd(0²I)(gh)^;^Nd(0²I)(g)^Nd(0²I)(h)R(d²) IEBQ²(g)BQ²(h) = ^Z ^Nd(0²I)(g)^Nd(0²I)(h)R(d²)^;Q(g)Q(h):

(Thus BQ¹+BQ² denes a version of BQ.)

Corollary 3.2

Suppose that the sequence (P⁽^q⁾)q d satises condition (B) of Corol- lary 2.2, and suppose that conditions (C1-3) are satised for Q = ^Nd(0I). Let F be as in Theorem 3.1. Then, asn and q tend to innity,

L

F(B⁽^qn⁾)P^b ^!^w^p ^L

F(BQ): ²

The measurability of F(B⁽^qn⁾) can be dropped, provided that our denition of weak convergence of random distributions is suitably extended see Remark 4.3 in Section 4.1.

4 Proofs

4.1 Hoeding's (1952) technique and a modication thereof

In connection with randomization tests, Hoeding (1952) observed that weak convergence of conditional distributions of test statistics is equivalent to the weak convergence of the unconditional distribution of suitable statistics in

R

². His result can be extended straightforwardly as follows.

Lemma 4.1

(Hoeding). For k 1 letXkX^fk ²

X

k and Tk ²

T

k be independent random variables, whereXkX^fk are identically distributed. Further let k be some measurable mapping from

X

k

T

k into the separable metric space (

M

), and let Q be a xed Borel probability measure on

M

. Then, as k ^!¹, the following two assertions are equivalent:

(

D1

) ^Lk(XkTk)Tk

!

w^p Q:

(

D2

) ^Lk(XkTk)k(X^fkTk) ^!^w QQ:

8

(9)

An application of this equivalence with non-Euclidean spaces

M

is given by Romano (1989). We shall utilize Lemma 4.1 in order to prove Theorem 2.1. In connection with empirical measures we use the following modication of Lemma 4.1, which is of independent interest.

Lemma 4.2

For

k

²^f123:::^gf1gletX^kX^k¹X^k²:::²

X

^kandT^k ²

T

^kbe independent random variables, where X^kX^k¹X^k²::: are identically distributed.

Further let ^k be some measurable mapping from

X

^k

T

^k into (

M

). Then, as k^! ¹, the following two assertions are equivalent:

(

E1

) ^Lk(XkTk)Tk

!

w^L ^L

¹(X¹T¹)T¹: (

E2

) For any integer L1, k(Xk`Tk)1`L ^!^L

¹(X¹`T¹)1`L:

Remark 4.3

(Non-separablity and non-measurability). Suppose that the metric space (

M

) is possibly nonseparable, and that the mappings k, 1 k <

1, are possibly non-measurable. The implications \(D2) =⁾ (D1)" and \(E2)

=⁾ (E1)" remain valid, provided that the limit distributions Q in Lemma 4.1 and ^L(¹(X¹T¹)) in Lemma 4.2 have separable support, if one uses Homann- Jorgensens notion of weak convergence (cf. van der Vaart and Wellner 1996, Chap- ter 1). The conditional distribution ^Lk(XkTk)Tk = tk

stands for the outer measure IPⁿk(Xktk) ² ^o on

M

, and ^Lk(XkTk)Tk

is said to converge weakly to Q in probability if for each xed f ² ^Cb(

M

), the real-valued random element IEf(k(XkTk))Tk

converges in outer probability to Q(f). Analogously,

L

k(XkTk)Tk

converges weakly in distribution to ^L¹(X¹T¹)T¹ if for any xed f ²^Cb(

M

), IEf(k(XkTk))Tk

converges in distribution (in the sense of Homann-Jorgensen) to the random variable IEf(¹(X¹T¹))T¹.

In this framework the reverse implications \(D1) =⁾(D2)" and \(E1) =⁾(E2)"

remain valid under some measurability. For instance, these conclusions are correct, 9

(10)

provided that for each k ² ^f123:::^g the mapping k(XkTk) is measurable with respect to the -eld on

M

generated by closed balls with respect to .

Given some familiarity with these concepts, one can easily adapt the subsequent proofs of Lemmas 4.1 and 4.2.

Proof of Lemma 4.1.

Dene Yk :=k(XkTk) and Y^ek :=k(X^fkTk). Suppose rst that ^L(YkY^ek)^!^w QQ. Then for any f ²^Cb(

M

),

IEIE(f(Yk)^jTk)^;Q(f)²

= IEIE(f(Yk)^jTk)²^;2Q(f)IEIE(f(Yk)^jTk) +Q(f)²

= IEIE(f(Yk)f(Y^ek)^jTk)^;2Q(f)IEIE(f(Yk)^jTk) +Q(f)²

= IE(f(Yk)f(Y^ek))^;2Q(f)IEf(Yk) +Q(f)²

!

Z f(y)f(y)Q(dy)Q(d^e y)^e ^;Q(f)²

= 0:

Thus^L(Yk^jTk)^!^w^p Q.

On the other hand, suppose that ^L(Yk^jTk) ^!^w^p Q. Then for arbitrary fg ²

Cb(

M

),

IEf(Yk)g(Y^ek) = IEIEf(Yk)g(Y^ek)Tk

= IEIE(f(Yk)^jTk)IE(f(Y^ek)^jTk)

! Q(f)Q(g)

because IE(h(Yk)^jTk) ^!^p ^R hdQ and IE(h(Yk)^jTk) ^kh^k¹ < ¹ for each h ²

Cb(

M

). Thus we know that IEF(YkYêk) ^! ^R F dQ Q for arbitrary functions F(yy) = f(y)g(ê y) with fgê ² ^Cb(

M

). But this is known to be equivalent to weak convergence of^L(YkY^ek) toQQ see van der Vaart and Wellner (1996, Chapter 1.4).

2

Proof of Lemma 4.2.

DeneY^k :=^k(X^kT^k) and Y^k` :=^k(X^k`T^k). Sup- pose rst that (Yk`)¹`L ^!^L(Y¹`)¹`L for any integerL1. For arbitrary xed

10

(11)

f ²^Cb(

M

),

IEIE(f(Y^k)^jT^k)^;L^;1^X^L

`⁼¹f(Y^k`)²

= IEIEIE(f(Y^k)^jT^k)^;L^;1^X^L

`⁼¹f(Y^k`)²T^k

= IEVarL^;1^X^L

`⁼¹f(Y^k`)T^k

L^;1^kf^k²¹:

Thus the sample mean L^;1^P_L`⁼¹f(Y^k`) approximates the conditional expectation IE(f(Y^k)^jT^k) arbitrarily well in quadratic mean, provided thatL is suciently large.

However, the variableL^;1^P_L`⁼¹f(Yk`) converges in distribution toL^;1^P_L`⁼¹f(Y¹`), according to the Continuous Mapping Theorem. Consequently, IE(f(Yk)^jTk) converges in distribution to IE(f(Y¹)^jT¹), whence ^L(Yk^jTk)^!^w^L ^L(Y¹^jT¹).

On the other hand, suppose that the conditional distribution^L(Yk^jTk) converges weakly in distribution to^L(Y¹^jT¹). In order to show that (Yk`)¹`L converges in distribution to (Y¹`)¹`L one has to show that

IE^Y^L

`⁼¹f`(Yk`) ^! IE^Y^L

`⁼¹f`(Y¹`)

for arbitrary functions f¹f²:::f` ²^Cb(

M

) (cf. van der Vaart and Wellner, 1996, Chapter 1.4). But

IE^Y^L

`⁼¹f`(Y^k`) = IEIE^Y^L

`⁼¹f`(Y^k`)T^k = IE ^Y^L

`⁼¹IE(f`(Y^k)^jT^k):

Thus it suces to show that IE(f`(Yk)^jTk)1`L converges in distribution to

IE(f`(Y¹)^jT¹)1`L. This follows easily from our assumption on ^L(Y^k^jT^k) via Fourier transformation, since for arbitrary ²

R

^L,

IE exp^p^;1 ^X^L

`⁼¹`IE(f`(Y^k)^jT^k) = IE exp^p^;1IE(F(Y^k)^jT^k)

with F :=^P_L`⁼¹`f` ²^Cb(

M

). ²

11

(12)

4.2 Proofs for Section 2

That ; = ;⁽^q⁾ is \uniformly" distributed on the set of column-wise orthonormal matrices in

R

^q^d means that^L(;) = ^L(;) for any xed orthonormal matrix ²

R

^q^q. For existence and uniqueness of the latter distribution we refer to Eaton (1989, Chap- ters 1 and 2). For the present purposes the following explicit construction described in Eaton (1989, Chapter 7) of ; is sucient. LetZ = Z⁽^q⁾ := (Z¹Z²:::Zd) be a random matrix in

R

^q^d with independent, standard Gaussian column vectors Zj in

R

^q. Then

; := Z(Z^>Z)^;1⁼² has the desired distribution, and

; = q^;1⁼²Z (I + O^p(q^;1⁼²)) as q^!¹: (4.1)

This equality can be viewed as an extension of Poincare's (1912) Lemma.

Proof of Theorem 2.1.

Let ; = ;(Z) as above. Suppose that Z = Z⁽^q⁾, X = X⁽^q⁾ and X =^f X^f⁽^q⁾ are independent with ^L(X) = ^L(X) = P, and let Y^f Y be two^e independent random vectors in

R

^d with distribution Q. According to Lemma 4.1, condition (A1) is equivalent to

(

A1

⁰)

;^>X

;^>fX

!

L Y

Ye

!

: Because of equation (4.1) this can be rephrased as (

A1

⁰⁰)

Y⁽^q⁾ Ye⁽^q⁾

!

:=

q^;1⁼²Z^>X q^;1⁼²Z^>fX

!

L Y

Ye

!

:

Now we prove equivalence of (A1") and (A2) starting from the observation that

L

!!

= IE^L

!

XX^f

!

= IE^N²d(0 ⁽^q⁾) where

(q⁾ :=

q^;1^kX^k²I q^;1X^>fX I q^;1X^>fX I q^;1^kX^f^k²I

!

2

R

²^d²^d: 12

(13)

Suppose that condition (A2) holds. Then ⁽^q⁾ converges in distribution to a random diagonal matrix

:=

S²I 0 0 S^e²I

!

with independent random variablesS²S^e²having distributionR. Clearly this implies that IE^N²d(0 ⁽^q⁾) ^!^w IE^N²d(0 ) = ^L

Y

Ye

!!

with Q = IE^N(0S²I). Hence (A1") holds.

On the other hand, suppose that (A1") holds. For any t = (t^>¹t^>²)^>²

R

²^d, the Fourier transform of^L((Y⁽^q^{) >}Y^e⁽^q^)>)^>) at t equals

IE exp^p^;1(t^>¹Y⁽^q⁾+t^>²Y^e⁽^q⁾) = IE exp(^;t^> ⁽^q⁾t=2) = H⁽^q⁾(a(t)) wherea(t) :=^;kt¹^k²=2^;kt²^k²=2^;t^>¹t²^>²

R

³, and

H⁽^q⁾(a) := IE expa¹q^;1^kX^k²+a²q^;1^kX^f^k²+a³q^;1X^>X^f

denotes the Laplace transform of^Lq^;1^kX^k²q^;1^kX^f^k²q^;1X^>fX^>at a²

R

³. By assumption, the Fourier transform at t converges to

IE exp(^p^;1t^>¹Y )IE exp(^p^;1t^>²Y ):

Setting t² = 0 and varying t¹ shows that the Laplace transform of ^L(q^;1^kX^k²) converges pointwise on ]^;¹0] to a continuous function. Hence q^;1^kX^k² converges in distribution to some random variableS² 0, and Q = IE^Nd(0S²I). Therefore, if S^e² denotes an independent copy of S², we know that H⁽^q⁾(a(t)) converges to

IE exp(a¹(t)S²)IE exp(a²(t)S²) = IE expa¹(t)S²+a²(t)S^e²+a³(t)0: A problem at this point is that for dimensiond = 1 the set ^fa(t) : t ²

R

²^d^g

R

³

has empty interior. Thus we cannot apply the standard argument about weak 13

(14)

convergence and convergence of Laplace transforms. However, lettingt² =t¹ with

kt¹^k²=2 = 1, one may conclude that for arbitrary r > 0, 0 = lim_q!1

H⁽^q⁾(^;1^;1^;2) +H⁽^q⁾(^;1^;12)^;2H⁽^q⁾(^;100)²

= lim_q!1

H⁽^q⁾(^;1^;1^;2) +H⁽^q⁾(^;1^;12)^;2IE exp(^;q^;1^kX^k²^;q^;1^kX^f^k²)

= 2 lim_q!1 IE exp^;q^;1^kX^k² ^;q^;1^kX^f^k²cosh(2q^;1X^>fX)^;1

2exp(^;2r)(cosh(2)^;1)

limsup_q

!1

IPⁿq^;1^kX^k² < rq^;1^kX^f^k² < r^jq^;1X^>fX^j^o

2exp(^;2r)(cosh(2)^;1) limsup_q

!1

IP^fjq^;1X^>fX^j^g^;2IP^fq^;1^kX^k² r^g

2exp(^;2r)(cosh(2)^;1)limsup_q

!1

IP^fjq^;1X^>X^f^j^g^;2IP^fS² r^g whence

limsup_q

!1

IP^fjq^;1X^>fX^j^g 2IP^fS²r^g:

Consequently,q^;1X^>fX ^!^p 0. ²

Proof of equivalence of (A1-2) and (A3).

Proving that (A3) implies (A1- 2) is elementary. In order to show that (A1-2) implies (A3) note rst that conditions (A1-2) for the distributionsP⁽^q⁾ implythe same conditions for the symmetrized distributions

Po =Po⁽^q⁾ := ^L⁽_X_X^e⁾_P_P(X ^;X) =^f ^L(k(Zk ^;Zq⁺k)¹kq

: Condition (A2) for these distributions reads as follows.

L

q^;1^X^q

k⁼¹k²(Zk ^;Zq⁺k)² ^!^w Ro =R ? R and (4.2)

q^;1^X^q

k⁼¹k²(Zk^;Zq⁺k)(Z²q⁺k^;Z³q⁺k) ^!^p 0:

(4.3)

The summandsq^;1²k(Zk^;Zq⁺k)(Z²q⁺k^;Z³q⁺k), 1kq, in (4.3) are independent and symmetrically distributed. Therefore one can easily deduce from (4.3) that

14

(15)

q^;1max¹kqk²^!0. But then q^;1^X^q

k⁼¹k²(Zk ^;Zq⁺k)² = 2q^;1^k^k²+o^p(1 +q^;1^k^k²)

and one can deduce from (4.2) thatq^;1^k⁽^q⁾^k² converges to some xed numberr in particular, R = r. Now we return to the original distributionsP. Here the second half of (A2) means that

q^;1^X^k

k⁼¹(k +kZk)(k +kZq⁺k)

= q^;1^k^k²+q^;1^X^q

k⁼¹kk(Zk+Zq⁺k) +q^;1^X^q

k⁼¹k²ZkZq⁺k

= o^p(1):

Since

IEq^;1^X^q

k⁼¹kk(Zk +Zq⁺k)² = q^;2^X^q

k⁼¹²_k²_k = o(q^;1^k^k²) IEq^;1^X^q

k⁼¹_k²ZkZq⁺k

2

= q^;2^X^q

k⁼¹_k⁴ ^! 0

it follows that q^;1^k^k² ^!0. ²

4.3 Proof of Theorem 3.1

Let (;⁽^q`⁾)` ¹ be a sequence of independent copies of ; which is stochastically independent fromP. Dene^b

B⁽^qn`⁾ := n¹⁼²(;⁽^q`^)>P^b^;;⁽^q`^)>P)(h)_h2H:

The B⁽^qn`⁾, ` 1, are dependent copies of B⁽^qn⁾. Further consider independent processes BQ⁽¹⁾¹BQ⁽²⁾¹BQ⁽³⁾¹::: and BQ² with ^L(BQ⁽^`⁾¹) = ^L(BQ¹) and ^L(BQ²) as described in Theorem 3.1. According to Lemma 4.2 it suces to show that for any xed integerL1 and ! :=^f12:::L^g, the random elements

~B⁽^qn⁾ := B⁽^qn`⁾(h)(`h^)2H

15

(16)

converge in distribution in`¹(! ^H) to

~B := (BQ⁽^`⁾¹+BQ²)(h)(`h^)2H

as q ^! ¹ and n ^! ¹. For that purpose it suces to verify the following two claims.

(F1)

As q ^! ¹ and n ^! ¹, the nite-dimensional marginal distributions of the process ~B⁽^qn⁾ converge to the corresponding nite-dimensional distributions of ~B.

(F2)

As q^!¹, n^!¹ and ^#0, max_`

2

gh^2H:sup^Q⁽gh⁾<

B⁽^qn`⁾(g)^;B⁽^qn`⁾(h) ^!^p 0:

In order to verify assertions (F1-2) we consider the conditional distribution of

~B⁽^qn⁾ given the random matrix

~; = ~;⁽^q⁾ := (;⁽^q¹⁾;⁽^q²⁾:::;⁽^qL⁾) ²

R

^q^Ld: In fact, if we dene

~f`h(v) := h(v`) for v = (v¹^>:::v_L^>)^>²

R

^Ld then B⁽^qn`⁾(h) = n¹⁼²(~;^>P^b ^;~;^>P)(~f`h):

Thus ^L(~B⁽^qn⁾^j~;) is essentially the distribution of an empirical process based on n independent random vectors with distribution ~;^>P on

R

^Ld and indexed by the family ~^H :=^f~f`h :`²!h²^Hg.

The multivariate version of Lindeberg's Central Limit Theorem entails that for large q and n the nite-dimensional marginal distributions of ~B⁽^qn⁾, conditional on

~;, can be approximated by the corresponding nite-dimensional distributions of a centered Gaussian process on ! ^H with the same covariance function, namely

(q⁾(`g)(mh) := CovB⁽^qn`⁾(g)B⁽^qnm⁾(h)~;

= ~;^>P(~f`g~fmh)^;~;^>P(~f`g)~;^>P(~fmh):

16