Chaining in reverse and stationary processes

Part I Concentration

6.2 Chaining in reverse and stationary processes

In the previous section we made a first step towards proving lower bounds for the suprema of Gaussian processes: we showed how one can make precise the intuition that well-separated points behave like independent variables. This allows us to obtain a lower bound in terms of the covering number at a single scale. However, in the upper bound we obtained by chaining, we necessarily must deal with infinitely many scales in order to eliminate the remainder term in the chaining method. In order to close the gap between our upper and lower bounds, our second challenge is therefore to show how to obtain amultiscale lower bound. We will presently show how this can be done.

Let us recall the basic step in the chaining method: if diam(T)≤εand if N⊆T is anε/2-net, then we have for some universal constantc1

sup

t∈T

X_t

≤c₁εp

log|N|+E

sup

t∈T

{Xt−X_π(t)}

This yields the contribution at a single scale ε, plus a remainder term. By iterating this bound, we can eliminate the remainder term and obtain a sum at infinitely many scales. To obtain a matching lower bound, we would like to mimick this procedure in the reverse direction. In order to do this, we would like to have an inequality of the following form: ifN ⊆T is anε-packing, then

sup

t∈T

X_t

≥c₂εp

log|N| + a remainder term

for some universal constant c2. In the absence of the remainder term, this is precisely Sudakov’s inequality proved in the previous section. However, without the remainder term, our lower bound necessarily terminates at a

6.2 Chaining in reverse and stationary processes 163 single scale. On the other hand, if we could prove an improvement of Sudakov’s inequality that includes a remainder term (hopefully of a similar form to the one that appears in the chaining upper bound), then it becomes possible to iterate this inequality to obtain a multiscale lower bound. In essence, our aim is to develop an improved version of Sudakov’s inequality that will allow us to run the chaining argument in reverse! This is the idea of the following result.

Theorem 6.11 (Super-Sudakov). Let {Xt}t∈T be a separable Gaussian process and letN be an ε-packing of(T, d). Then we can estimate

sup

t∈T

X_t

≥cεp

log|N|+ min

s∈NE

sup

t∈B(s,αε)

X_t

wherecandα < ¹₂ are universal constants andB(s, ε) :={t∈T :d(t, s)≤ε}.

The geometry of Theorem 6.11 is illustrated in the following figure:

ε αε

The setT (large circle) is packed with points at distanceε; around each point in the packing, we consider the set of parameters in a ball with radius αε (small circles). The supremum of the process over the entire set is estimated from below by the lower bound obtained by applying Sudakov’s inequality to the ε-packing, plus a remainder term which corresponds to the smallest expected supremum of the process over one of the disjoint balls.

The proof of Theorem 6.11 is not difficult. It will be deduced directly from Sudakov’s inequality, together with the following basic consequence of the Gaussian concentration principle (Theorem 3.25).

Lemma 6.12 (Concentration of suprema). Let {X_t}_t∈T be a separable Gaussian process. Thensup_t∈TX_tissup_t∈TVar[X_t]-subgaussian.

Proof. By separability, we can approximate the supremum over T by the supremum over a finite set (cf. the proof of Theorem 5.24). It therefore suffices to prove the result for the maximum max_i≤nX_iof ann-dimensional Gaussian vectorX ∼N(0, Σ). It is convenient to write X =Σ^1/2Z for Z ∈ N(0, I).

It then follows from Theorem 3.25 that max_i≤nX_i is k∇fk²_∞-subgaussian, where we have defined the functionf(z) := max_i≤n(Σ^1/2z)_i. Note that

164 6 Gaussian processes We now complete the proof of Theorem 6.11.

Proof (Theorem 6.11).We can evidently estimate E bound-ing the first term usbound-ing Theorem 6.5 and the last term usbound-ing Lemma 5.1,

for some universal constantc. Choosingα=c/2√

2 completes the proof. ut Let us compare the lower bound of Theorem 6.11 to the chaining upper bound. An immediate difference between the two bounds is that the former is stated in terms of anε-packing, while the latter is in terms of anε-net. This will be taken care of using the duality between covering and packing, however, so that this difference is not a major concern at this stage. A more pressing concern is the minimum in the bound of Theorem 6.11. To emphasize this issue, let us reformulate the chaining upper bound to bring out the similarity between the two bounds: if diam(T)≤εandN ⊆T is anαε-net, then

6.2 Chaining in reverse and stationary processes 165 The first inequality follows trivially from the chaining upper bound as stated at the beginning of this section, while the second bound is readily obtained by using Gaussian concentration as in the proof of Theorem 6.11. In contrast, the bound of Theorem 6.11 states that ifN is anε-packing, then

When phrased in this manner, the two bounds appear to be very similar, with one crucial difference: in the chaining upper bound, the remainder term is the largestexpected supremum of the Gaussian process over a ball centered at one of the points inN, while the remainder term in Theorem 6.11 is thesmallest expected supremum over such a ball. There is no reason why the supremum of the Gaussian process over two balls of the same radius should be of the same order: in general, the remainder terms in our upper and lower bounds can be of a very different order of magnitude. The major remaining question, to be addressed in the next section, is how to overcome this problem.

For the time being, however, we would like to illustrate the idea of chaining in reverse without having to cope with the complications arising from the above problem. To this end, we will investigate in the remainder of this section a special class of Gaussian processes for which this problem does not arise.

Definition 6.13 (Stationary Gaussian process). The Gaussian process {X_t}_t∈T is called stationary if there exists a group Gacting onT such that 1.d(g(t), g(s)) =d(t, s)for allt, s∈T,g∈G(translation invariance).

2. For everyt, s∈T, there existsg∈Gsuch that t=g(s) (transitivity).

Of course, the key point of this definition is that for a stationary Gaussian process all balls are created equal: indeed, we have equality in distribution

{Xt−X_s:t∈B(s, ε)}=^d{Xt−X_s⁰ :t∈B(s⁰, ε)} for alls, s⁰∈T.

To see this, recall that the law of the increments of a Gaussian process is entirely determined by the natural metric d, and note that ifg ∈ Gis such thats⁰=g(s), thengmapsB(s, ε) isometrically ontoB(s⁰, ε). Thus so our upper and lower bounds are of the same order in this case.

Example 6.14 (Brownian motion).Let{Bt}_t∈Rbe two-sided Brownian motion (that is, Bt = B⁰_t for t ≥ 0 and Bt = B_−t⁰⁰ for t < 0, where {B_t⁰}_t≥0 and {B_t⁰⁰}_t≥0are independent standard Brownian motions). We can view the index setR itself as a groupG= (R,+) under addition. It is now easily seen that Brownian motion is a stationary Gaussian process: transitivity is obvious, while translation invariance can be read off fromd(t, s) =p

|t−s|.

166 6 Gaussian processes

Example 6.15 (Random Fourier series). A classical application of stationary Gaussian processes is to develop an understanding of Fourier series with ran-dom coefficients. Letg_k andg_k⁰ be i.i.d.N(0,1) random variables, and letc_k be coefficients such thatP

kc²_k<∞. Define fort∈S¹= [0,2π[ the process Xt=

∞

k=0

ck{gksinkt+g⁰_kcoskt}.

Then{X_t}_t∈S1 is a stationary Gaussian process for the group of rotations of the circleS¹. Indeed, transitivity is obvious, and is it not difficult to compute d(t, s)²= 2P

kc²_k{1−cos(k(t−s))}which is evidently translation-invariant.

Under the stationarity assumption, we have seen that the upper bound we have used in a single iteration of the chaining argument is matched by an essentially equivalent lower bound. Therefore, in this setting, we expect that the chaining bound obtained in the previous chapter is tight. To prove this, little remains but to run the chaining argument in reverse.

Theorem 6.16 (Fernique).Let{Xt}t∈T be a stationary separable Gaussian process. Then we can estimate for some universal constantsc1, c2

c₁ Z ∞

plogN(T, d, ε)dε≤E

sup

t∈T

X_t

≤c₂ Z ∞

plogN(T, d, ε)dε.

Proof. As the Gaussian process is stationary, all balls behave in the same way.

Thus we will lighten our notation by definingB(ε) =B(t₀, ε) for some fixed by arbitrary pointt₀∈T. This will play the role of our “representative ball”.

Let us begin by applying Theorem 6.11 at the scaleαⁿ. ChooseN_n to be a maximalαⁿ⁺²-packing of the ballB(αⁿ⁺¹). Then we have

[

s∈Nn

B(s, αⁿ⁺³)⊆B(αⁿ),

as d(t0, t) ≤ d(t0, s) +d(s, t) ≤ αⁿ⁺¹ +αⁿ⁺³ ≤ αⁿ for every s ∈ Nn and t∈B(s, αⁿ⁺³). This situation is illustrated in the following figure:

αⁿ

αⁿ⁺¹ αⁿ⁺²

αⁿ⁺³

6.2 Chaining in reverse and stationary processes 167 By the maximality of the packingNn, the duality between packing and cov-ering numbers yields|Nn| ≥N(B(αⁿ⁺¹), d, αⁿ⁺²). Thus Theorem 6.11 yields (the term on the left being the one that arises in Theorem 6.11).

We now iterate this bound. Letk₀ be the largest integer such thatα^k⁰ ≥ diam(T). If we start the iteration at anyn≤k₀, then we obtain

This completes the core part of the proof of Theorem 6.16: we have obtained a multiscale lower bound on the supremum of the Gaussian process by “chaining in reverse”. However, at first sight the lower bound looks a little different than the upper bound of Theorem 5.24. The difference proves to be cosmetic, and we will presently “fix” the discrepancy between the two bounds.

First, note that the terms in the above sum “skip” from scaleα^k toα^k+3, rather than summing over all k ∈ Z. As the starting point n is arbitrary, however, we can fix this by averaging overn=k0, k0−1, k0−2. This yields

The remaining problem with this lower bound is that it contains covering numbers of the formN(B(α^k), d, α^k+1), while our upper bound is phrased in terms of covering numbers of the entire setN(T, d, α^k+1). To fix this, let us do some covering number gymnastics. Suppose we can coverT bym balls of radiusα^k, and that each ball of radiusα^kcan be covered bym⁰ balls of radius α^k+1. Then clearly T can be covered by mm⁰ balls of radius α^k+1. We can choose m = N(T, d, α^k) and m⁰ = N(B(α^k), d, α^k+1) (using stationarity to argue that the covering number of any ballB(s, α^k) is equal to that of our representative ballB(α^k)). A moment’s reflection will show that we proved

N(T, d, α^k+1)≤N(T, d, α^k)N(B(α^k), d, α^k+1).

This sort of reasoning is useful in many problems involving covering numbers.

In the present setting, plugging this identity into the above bound yields

168 6 Gaussian processes

for some universal constant c⁰, where we estimated the sum by an integral in the usual manner (cf. Problem 5.9). Note that in order to prove that the two terms in the first inequality are of the same order, we used the fact that the sum runs over allk∈Z and not just over multiples of three. This minor annoyance in the proof therefore does serve a purpose.

We have now proved the lower bound. The corresponding upper bound follows immediately from the previous chapter (Corollary 5.25). ut Problems

6.4 (An alternative proof of super-Sudakov). We deduced the super-Sudakov inequality from the ordinary super-Sudakov inequality together with Gaus-sian concentration. It is also possible, however, to obtain Theorem 6.11 directly from the Slepian-Fernique inequality by modifying the proof of the Sudakov inequality. The advantage of this is that it yields somewhat sharper constants.

The aim of this problem is to develop this alternative proof.

For simplicity, let {Xt}t∈T be a Gaussian process on afinite index setT (the extension to the case of a separable Gaussian process follows readily as in the proof of Theorem 5.24). LetN be anε-packing of (T, d).

a. For everys∈N, letTs:={t∈T :d(t, s)≤¹₄ε} and Z_t=X_t^(s)−X_s^(s)+¹₄εg_s fort∈T_s, s∈N,

where{X_t^(s)}_t∈T are independent copies of{X_t}_t∈T andg_sare independent N(0,1) random variables fors∈N. Show that we have

E|Xt−Xt⁰|²≥E|Zt−Zt⁰|² for allt, t⁰ ∈ [

s∈N

Ts. b. Conclude from Theorem 6.8 that

E c. Use Jensen’s inequality conditionally on{gs}_s∈N to conclude that

6.2 Chaining in reverse and stationary processes 169

6.5 (Rectangles).Consider the Gaussian process{Xt}_t∈{−1,1}ⁿ of the form Xt=

k=1

gktkak,

wherea1>· · ·> an>0 are given constants andg1, . . . , gn are i.i.d.N(0,1).

Such a process is called a rectangle (as the index set ({−1,1}ⁿ, d) has the same geometry as the corners of a rectangle in (Rⁿ,k · k)).

a. Show that

sup

t∈{−1,1}ⁿ

= r2

k=1

ak.

b. Argue that{X_t}_t∈{−1,1}n is a stationary Gaussian process, so that Z ∞

plogN({−1,1}ⁿ, d, ε)dε

k=1

ak.

c. Attempt to verify this conclusion by estimating covering numbers and com-puting the entropy integral directly. (This is surprisingly hard!)

d. Letak = 1/k. Show that for everyn≥1 sup

ε>0

εp

logN({−1,1}ⁿ, d, ε)≤c and

k=1

ak &logn

for some universal constantc. Therefore, while the chaining bound of The-orem 5.24 is sharp, Sudakov’s inequality is far from sharp in this example.

6.6 (A nonstationary process). Consider the Gaussian process{Xn}_n∈_N Xn= gn

√1 + logn,

where{gn}n∈Nare i.i.d.N(0,1). This process is most definitely not stationary.

a. Show that

sup

n∈N

X_n

<∞.

b. Show that

Z ∞ 0

plogN(N, d, ε)dε=∞,

so the conclusion of Theorem 6.16 can indeed fail in the nonstationary case.

170 6 Gaussian processes

c. To gain some insight into the problem, compute the quantity E

sup

d(n,m)≤ε

for differentm∈N. Conclude that while one needsN(N, d, ε) balls of radius εto coverN(and N(N, d, ε)↑ ∞as ε↓0), the expected supremum of the Gaussian process over all but one of these balls vanishes. Thus the remain-der terms in our chaining upper and lower bounds are not comparable (in fact, in this case it is clearly theupper bound that is inefficient).

6.7 (An improved chaining argument).Let{Xt}_t∈T be a (nonstationary) Gaussian process. In order to compare the super-Sudakov inequality to the chaining upper bound, we used Gaussian concentration to reformulate the upper bound as follows: if diam(T)≤εand N⊆T is anαε-net, then

sup

t∈T

X_t

≤cεp

log|N|+ max

s∈N E

sup

t∈B(s,αε)

X_t

The goal of this problem is to note that chaining using this improved inequality will in fact yield a slightly improved version of Corollary 5.25:

sup

t∈T

≤c1sup

t∈T

Z ∞ 0

plogN(B(t, c2ε), d, ε)dε for universal constantsc₁, c₂>1.

a. Prove the above inequality.

b. Find an example where this inequality is sharp, but Corollary 5.25 is not.

Hint: letTbe a (not necessarily regular) finite rooted tree with roott0∈T and leavesT ⊆T. Assume that all leaves have the same depthn. For every leaft∈T, denote byπ0(t), π1(t), . . . , πn(t) the unique path in the tree from the rootπ0(t) =t0to the leafπn(t) =t. Attach to each vertexs∈Tan i.i.d.

N(0,1) random variable ξs, and define {Xt}t∈T as Xt = Pn

k=0β^kξπ_k(t). Chooseβ <1 and an irregular treeTcarefully to construct the example.

c. Find an example where also the present inequality is not sharp.

Hint: consider Problem 6.6.

Im Dokument Probability in High Dimension (Seite 168-176)