• Keine Ergebnisse gefunden

Chaining in reverse and stationary processes

Im Dokument Probability in High Dimension (Seite 168-176)

Part I Concentration

6.2 Chaining in reverse and stationary processes

In the previous section we made a first step towards proving lower bounds for the suprema of Gaussian processes: we showed how one can make precise the intuition that well-separated points behave like independent variables. This allows us to obtain a lower bound in terms of the covering number at a single scale. However, in the upper bound we obtained by chaining, we necessarily must deal with infinitely many scales in order to eliminate the remainder term in the chaining method. In order to close the gap between our upper and lower bounds, our second challenge is therefore to show how to obtain amultiscale lower bound. We will presently show how this can be done.

Let us recall the basic step in the chaining method: if diam(T)≤εand if N⊆T is anε/2-net, then we have for some universal constantc1

E

sup

t∈T

Xt

≤c1εp

log|N|+E

sup

t∈T

{Xt−Xπ(t)}

.

This yields the contribution at a single scale ε, plus a remainder term. By iterating this bound, we can eliminate the remainder term and obtain a sum at infinitely many scales. To obtain a matching lower bound, we would like to mimick this procedure in the reverse direction. In order to do this, we would like to have an inequality of the following form: ifN ⊆T is anε-packing, then

E

sup

t∈T

Xt

≥c2εp

log|N| + a remainder term

for some universal constant c2. In the absence of the remainder term, this is precisely Sudakov’s inequality proved in the previous section. However, without the remainder term, our lower bound necessarily terminates at a

6.2 Chaining in reverse and stationary processes 163 single scale. On the other hand, if we could prove an improvement of Sudakov’s inequality that includes a remainder term (hopefully of a similar form to the one that appears in the chaining upper bound), then it becomes possible to iterate this inequality to obtain a multiscale lower bound. In essence, our aim is to develop an improved version of Sudakov’s inequality that will allow us to run the chaining argument in reverse! This is the idea of the following result.

Theorem 6.11 (Super-Sudakov). Let {Xt}t∈T be a separable Gaussian process and letN be an ε-packing of(T, d). Then we can estimate

E

sup

t∈T

Xt

≥cεp

log|N|+ min

s∈NE

sup

t∈B(s,αε)

Xt

,

wherecandα < 12 are universal constants andB(s, ε) :={t∈T :d(t, s)≤ε}.

The geometry of Theorem 6.11 is illustrated in the following figure:

ε αε

The setT (large circle) is packed with points at distanceε; around each point in the packing, we consider the set of parameters in a ball with radius αε (small circles). The supremum of the process over the entire set is estimated from below by the lower bound obtained by applying Sudakov’s inequality to the ε-packing, plus a remainder term which corresponds to the smallest expected supremum of the process over one of the disjoint balls.

The proof of Theorem 6.11 is not difficult. It will be deduced directly from Sudakov’s inequality, together with the following basic consequence of the Gaussian concentration principle (Theorem 3.25).

Lemma 6.12 (Concentration of suprema). Let {Xt}t∈T be a separable Gaussian process. Thensupt∈TXtissupt∈TVar[Xt]-subgaussian.

Proof. By separability, we can approximate the supremum over T by the supremum over a finite set (cf. the proof of Theorem 5.24). It therefore suffices to prove the result for the maximum maxi≤nXiof ann-dimensional Gaussian vectorX ∼N(0, Σ). It is convenient to write X =Σ1/2Z for Z ∈ N(0, I).

It then follows from Theorem 3.25 that maxi≤nXi is k∇fk2-subgaussian, where we have defined the functionf(z) := maxi≤n1/2z)i. Note that

164 6 Gaussian processes We now complete the proof of Theorem 6.11.

Proof (Theorem 6.11).We can evidently estimate E bound-ing the first term usbound-ing Theorem 6.5 and the last term usbound-ing Lemma 5.1,

E

for some universal constantc. Choosingα=c/2√

2 completes the proof. ut Let us compare the lower bound of Theorem 6.11 to the chaining upper bound. An immediate difference between the two bounds is that the former is stated in terms of anε-packing, while the latter is in terms of anε-net. This will be taken care of using the duality between covering and packing, however, so that this difference is not a major concern at this stage. A more pressing concern is the minimum in the bound of Theorem 6.11. To emphasize this issue, let us reformulate the chaining upper bound to bring out the similarity between the two bounds: if diam(T)≤εandN ⊆T is anαε-net, then

6.2 Chaining in reverse and stationary processes 165 The first inequality follows trivially from the chaining upper bound as stated at the beginning of this section, while the second bound is readily obtained by using Gaussian concentration as in the proof of Theorem 6.11. In contrast, the bound of Theorem 6.11 states that ifN is anε-packing, then

E

When phrased in this manner, the two bounds appear to be very similar, with one crucial difference: in the chaining upper bound, the remainder term is the largestexpected supremum of the Gaussian process over a ball centered at one of the points inN, while the remainder term in Theorem 6.11 is thesmallest expected supremum over such a ball. There is no reason why the supremum of the Gaussian process over two balls of the same radius should be of the same order: in general, the remainder terms in our upper and lower bounds can be of a very different order of magnitude. The major remaining question, to be addressed in the next section, is how to overcome this problem.

For the time being, however, we would like to illustrate the idea of chaining in reverse without having to cope with the complications arising from the above problem. To this end, we will investigate in the remainder of this section a special class of Gaussian processes for which this problem does not arise.

Definition 6.13 (Stationary Gaussian process). The Gaussian process {Xt}t∈T is called stationary if there exists a group Gacting onT such that 1.d(g(t), g(s)) =d(t, s)for allt, s∈T,g∈G(translation invariance).

2. For everyt, s∈T, there existsg∈Gsuch that t=g(s) (transitivity).

Of course, the key point of this definition is that for a stationary Gaussian process all balls are created equal: indeed, we have equality in distribution

{Xt−Xs:t∈B(s, ε)}=d{Xt−Xs0 :t∈B(s0, ε)} for alls, s0∈T.

To see this, recall that the law of the increments of a Gaussian process is entirely determined by the natural metric d, and note that ifg ∈ Gis such thats0=g(s), thengmapsB(s, ε) isometrically ontoB(s0, ε). Thus so our upper and lower bounds are of the same order in this case.

Example 6.14 (Brownian motion).Let{Bt}t∈Rbe two-sided Brownian motion (that is, Bt = B0t for t ≥ 0 and Bt = B−t00 for t < 0, where {Bt0}t≥0 and {Bt00}t≥0are independent standard Brownian motions). We can view the index setR itself as a groupG= (R,+) under addition. It is now easily seen that Brownian motion is a stationary Gaussian process: transitivity is obvious, while translation invariance can be read off fromd(t, s) =p

|t−s|.

166 6 Gaussian processes

Example 6.15 (Random Fourier series). A classical application of stationary Gaussian processes is to develop an understanding of Fourier series with ran-dom coefficients. Letgk andgk0 be i.i.d.N(0,1) random variables, and letck be coefficients such thatP

kc2k<∞. Define fort∈S1= [0,2π[ the process Xt=

X

k=0

ck{gksinkt+g0kcoskt}.

Then{Xt}t∈S1 is a stationary Gaussian process for the group of rotations of the circleS1. Indeed, transitivity is obvious, and is it not difficult to compute d(t, s)2= 2P

kc2k{1−cos(k(t−s))}which is evidently translation-invariant.

Under the stationarity assumption, we have seen that the upper bound we have used in a single iteration of the chaining argument is matched by an essentially equivalent lower bound. Therefore, in this setting, we expect that the chaining bound obtained in the previous chapter is tight. To prove this, little remains but to run the chaining argument in reverse.

Theorem 6.16 (Fernique).Let{Xt}t∈T be a stationary separable Gaussian process. Then we can estimate for some universal constantsc1, c2

c1 Z

0

plogN(T, d, ε)dε≤E

sup

t∈T

Xt

≤c2 Z

0

plogN(T, d, ε)dε.

Proof. As the Gaussian process is stationary, all balls behave in the same way.

Thus we will lighten our notation by definingB(ε) =B(t0, ε) for some fixed by arbitrary pointt0∈T. This will play the role of our “representative ball”.

Let us begin by applying Theorem 6.11 at the scaleαn. ChooseNn to be a maximalαn+2-packing of the ballB(αn+1). Then we have

[

s∈Nn

B(s, αn+3)⊆B(αn),

as d(t0, t) ≤ d(t0, s) +d(s, t) ≤ αn+1n+3 ≤ αn for every s ∈ Nn and t∈B(s, αn+3). This situation is illustrated in the following figure:

αn

αn+1 αn+2

αn+3

6.2 Chaining in reverse and stationary processes 167 By the maximality of the packingNn, the duality between packing and cov-ering numbers yields|Nn| ≥N(B(αn+1), d, αn+2). Thus Theorem 6.11 yields (the term on the left being the one that arises in Theorem 6.11).

We now iterate this bound. Letk0 be the largest integer such thatαk0 ≥ diam(T). If we start the iteration at anyn≤k0, then we obtain

This completes the core part of the proof of Theorem 6.16: we have obtained a multiscale lower bound on the supremum of the Gaussian process by “chaining in reverse”. However, at first sight the lower bound looks a little different than the upper bound of Theorem 5.24. The difference proves to be cosmetic, and we will presently “fix” the discrepancy between the two bounds.

First, note that the terms in the above sum “skip” from scaleαk toαk+3, rather than summing over all k ∈ Z. As the starting point n is arbitrary, however, we can fix this by averaging overn=k0, k0−1, k0−2. This yields

The remaining problem with this lower bound is that it contains covering numbers of the formN(B(αk), d, αk+1), while our upper bound is phrased in terms of covering numbers of the entire setN(T, d, αk+1). To fix this, let us do some covering number gymnastics. Suppose we can coverT bym balls of radiusαk, and that each ball of radiusαkcan be covered bym0 balls of radius αk+1. Then clearly T can be covered by mm0 balls of radius αk+1. We can choose m = N(T, d, αk) and m0 = N(B(αk), d, αk+1) (using stationarity to argue that the covering number of any ballB(s, αk) is equal to that of our representative ballB(αk)). A moment’s reflection will show that we proved

N(T, d, αk+1)≤N(T, d, αk)N(B(αk), d, αk+1).

This sort of reasoning is useful in many problems involving covering numbers.

In the present setting, plugging this identity into the above bound yields

168 6 Gaussian processes

for some universal constant c0, where we estimated the sum by an integral in the usual manner (cf. Problem 5.9). Note that in order to prove that the two terms in the first inequality are of the same order, we used the fact that the sum runs over allk∈Z and not just over multiples of three. This minor annoyance in the proof therefore does serve a purpose.

We have now proved the lower bound. The corresponding upper bound follows immediately from the previous chapter (Corollary 5.25). ut Problems

6.4 (An alternative proof of super-Sudakov). We deduced the super-Sudakov inequality from the ordinary super-Sudakov inequality together with Gaus-sian concentration. It is also possible, however, to obtain Theorem 6.11 directly from the Slepian-Fernique inequality by modifying the proof of the Sudakov inequality. The advantage of this is that it yields somewhat sharper constants.

The aim of this problem is to develop this alternative proof.

For simplicity, let {Xt}t∈T be a Gaussian process on afinite index setT (the extension to the case of a separable Gaussian process follows readily as in the proof of Theorem 5.24). LetN be anε-packing of (T, d).

a. For everys∈N, letTs:={t∈T :d(t, s)≤14ε} and Zt=Xt(s)−Xs(s)+14εgs fort∈Ts, s∈N,

where{Xt(s)}t∈T are independent copies of{Xt}t∈T andgsare independent N(0,1) random variables fors∈N. Show that we have

E|Xt−Xt0|2≥E|Zt−Zt0|2 for allt, t0 ∈ [

s∈N

Ts. b. Conclude from Theorem 6.8 that

E c. Use Jensen’s inequality conditionally on{gs}s∈N to conclude that

E

6.2 Chaining in reverse and stationary processes 169

6.5 (Rectangles).Consider the Gaussian process{Xt}t∈{−1,1}n of the form Xt=

n

X

k=1

gktkak,

wherea1>· · ·> an>0 are given constants andg1, . . . , gn are i.i.d.N(0,1).

Such a process is called a rectangle (as the index set ({−1,1}n, d) has the same geometry as the corners of a rectangle in (Rn,k · k)).

a. Show that

E

sup

t∈{−1,1}n

Xt

= r2

π

n

X

k=1

ak.

b. Argue that{Xt}t∈{−1,1}n is a stationary Gaussian process, so that Z

0

plogN({−1,1}n, d, ε)dε

n

X

k=1

ak.

c. Attempt to verify this conclusion by estimating covering numbers and com-puting the entropy integral directly. (This is surprisingly hard!)

d. Letak = 1/k. Show that for everyn≥1 sup

ε>0

εp

logN({−1,1}n, d, ε)≤c and

n

X

k=1

ak &logn

for some universal constantc. Therefore, while the chaining bound of The-orem 5.24 is sharp, Sudakov’s inequality is far from sharp in this example.

6.6 (A nonstationary process). Consider the Gaussian process{Xn}n∈N Xn= gn

√1 + logn,

where{gn}n∈Nare i.i.d.N(0,1). This process is most definitely not stationary.

a. Show that

E

sup

n∈N

Xn

<∞.

b. Show that

Z 0

plogN(N, d, ε)dε=∞,

so the conclusion of Theorem 6.16 can indeed fail in the nonstationary case.

170 6 Gaussian processes

c. To gain some insight into the problem, compute the quantity E

sup

d(n,m)≤ε

Xn

for differentm∈N. Conclude that while one needsN(N, d, ε) balls of radius εto coverN(and N(N, d, ε)↑ ∞as ε↓0), the expected supremum of the Gaussian process over all but one of these balls vanishes. Thus the remain-der terms in our chaining upper and lower bounds are not comparable (in fact, in this case it is clearly theupper bound that is inefficient).

6.7 (An improved chaining argument).Let{Xt}t∈T be a (nonstationary) Gaussian process. In order to compare the super-Sudakov inequality to the chaining upper bound, we used Gaussian concentration to reformulate the upper bound as follows: if diam(T)≤εand N⊆T is anαε-net, then

E

sup

t∈T

Xt

≤cεp

log|N|+ max

s∈N E

sup

t∈B(s,αε)

Xt

.

The goal of this problem is to note that chaining using this improved inequality will in fact yield a slightly improved version of Corollary 5.25:

E

sup

t∈T

Xt

≤c1sup

t∈T

Z 0

plogN(B(t, c2ε), d, ε)dε for universal constantsc1, c2>1.

a. Prove the above inequality.

b. Find an example where this inequality is sharp, but Corollary 5.25 is not.

Hint: letTbe a (not necessarily regular) finite rooted tree with roott0∈T and leavesT ⊆T. Assume that all leaves have the same depthn. For every leaft∈T, denote byπ0(t), π1(t), . . . , πn(t) the unique path in the tree from the rootπ0(t) =t0to the leafπn(t) =t. Attach to each vertexs∈Tan i.i.d.

N(0,1) random variable ξs, and define {Xt}t∈T as Xt = Pn

k=0βkξπk(t). Chooseβ <1 and an irregular treeTcarefully to construct the example.

c. Find an example where also the present inequality is not sharp.

Hint: consider Problem 6.6.

Im Dokument Probability in High Dimension (Seite 168-176)