• Keine Ergebnisse gefunden

Subgaussian variables and Chernoff bounds

Im Dokument Probability in High Dimension (Seite 52-56)

Part I Concentration

3.1 Subgaussian variables and Chernoff bounds

Before we can prove any concentration inequalities, we must first consider how one might go about proving that a random variable satisfies a Gaussian tail bound. Most tail bounds in probability theory are proved using some form of Markov’s inequality. For example, if we have a bound on the variance as in the previous chapter, we immediately obtain a tail bound of the form

P[|X−E[X]| ≥t]≤Var[X]

t2 .

However, this bound only decays ast−2, and we cannot obtain Gaussian tail bounds from Poincar´e inequalities in this manner. To obtain Gaussian tail bounds, we must use Markov’s inequality in a more sophisticated manner.

The basic method is known as theChernoff bound.

Lemma 3.1 (Chernoff bound).Define the log-moment generating function ψof a random variableX and its Legendre dual ψ as

ψ(λ) := logE[eλ(X−EX)], ψ(t) = sup

λ≥0

{λt−ψ(λ)}.

ThenP[X−EX≥t]≤e−ψ(t)for all t≥0.

Proof. The idea is strikingly simple: we simply exponentiate inside the prob-ability before applying Markov’s inequality. For anyλ≥0, we have

P[X−EX ≥t] =P[eλ(X−EX)≥eλt]≤e−λtE[eλ(X−EX)] =e−{λt−ψ(λ)}

using Markov’s inequality and thatx 7→ eλx is increasing. As the left-hand side does not depend on the choice ofλ≥0, we can optimize the right-hand side overλto obtain the statement of the lemma. ut Remark 3.2.Note that the Chernoff bound only gives theupper tail, that is, the probabilityP[X ≥EX+t] that the random variableX exceeds its mean EX by a fixed amount. However, we can obtain an inequality for the lower tail by applying the Chernoff bound to the random variable−X, as

P[X≤EX−t] =P[−X ≥E[−X] +t].

In particular, given an upper and lower tail bound, we can obtain a bound on the magnitude of the fluctuations using the union bound

3.1 Subgaussian variables and Chernoff bounds 47 P[|X−EX| ≥t] =P[X≥EX+t orX ≤EX−t]

≤P[X≥EX+t] +P[−X≥E[−X] +t].

In many cases, proving an upper tail bound will immediately imply a lower tail bound and a two-sided bound in this manner. On the other hand, sometimes upper or lower tail bounds will be proved under assumptions that are not invariant under negation. For example, if we prove an upper tail bound for convex functionsf(X), this does not automatically imply a lower tail bound as−f(X) is concave and not convex; in such cases, a lower tail bound must be proved separately. One should therefore be careful when interpreting tail bounds to check separately the validity of upper and lower tail bounds.

Remark 3.3.The utility of the Chernoff bound is by no means restricted to proving Gaussian tails as we will do below. One can obtain many different tail behaviors in this manner. However, the method clearly only works ifψ(λ) is finite at least forλin a neighborhood of 0. Therefore, to apply the Chernoff bound, the random variableX should have at least exponential tails. For ran-dom variables with heavier tails an alternative method is needed, for example, one could take powers rather than exponentials in Markov’s inequality:

P[X−EX ≥t]≤ inf

p∈N

E[(X−EX)p+]

tp .

In fact, even when the Chernoff bound is applicable, it is not difficult to show that this moment bound is at least as good as the Chernoff bound.

Why are Chernoff bounds so useful? There are some simple examples, such as the case of sums of random variables, where the Chernoff bound proves to be easy to manipulate (we will exploit this in the next section). However, the real power of the Chernoff bound is that the log-moment generating function λ 7→ ψ(λ) is a continuous object, and can therefore be investigated using calculus. We will repeatedly exploit this approach in the sequel.

To show how the Chernoff bound can give rise to Gaussian tail bounds, let us first consider the case of an actual Gaussian random variable.

Example 3.4.LetX ∼N(µ, σ2). ThenE[eλ(X−EX)] =eλ2σ2/2, so ψ(λ) =λ2σ2

2 , ψ(t) = t22. In particular, we haveP[X−EX ≥t]≤e−t2/2σ2.

Observe that in order to get the tail bound in Example 3.4, the fact that X is Gaussian was not actually important: it would suffice to assume that the log-moment generating function is bounded from above by that of a Gaussian ψ(λ) ≤λ2σ2/2. Random variables that satisfy this condition play a central role in the investigation of Gaussian tail bounds.

48 3 Subgaussian concentration and log-Sobolev inequalities

Definition 3.5 (Subgaussian random variables). A random variable is called σ2-subgaussian if its log-moment generating function satisfies ψ(λ)≤ λ2σ2/2 for allλ∈R (and the constantσ2 is called the variance proxy).

Note that ifψ(λ) is the log-moment generating function of a random vari-able X, then ψ(−λ) is the log-moment generating function of the random variable−X. For aσ2-subgaussian random variableX, we can therefore ap-ply the Chernoff bound to both the upper and lower tails to obtain

P[X≥EX+t]≤e−t2/2σ2, P[X ≤EX−t]≤e−t2/2σ2.

As moment generating functions will prove to be much easier to manipulate than the tail probabilities themselves, we will almost always study Gaussian tail behavior of random variables in terms of the subgaussian property. Fortu-nately, it turns out that little is lost in making this simplification: any random variable that satisfies Gaussian tail bounds must necessarily be subgaussian (albeit for a slightly larger variance proxy), cf. Problem 3.1 below.

So far, the only examples of subgaussian random variables that we have encountered are Gaussians, which is not terribly interesting. One of the most basic results on subgaussian random variables is that everybounded random variable is subgaussian. This statement is made precise by Hoeffding’s lemma, which could be viewed as a far-reaching generalization of the trivial Lemma 2.1. Even in this simple setting, the proof provides a nontrivial illustration of the important role of calculus in bounding moment generating functions.

Lemma 3.6 (Hoeffding lemma). Let a≤ X ≤b a.s. for some a, b ∈ R. ThenE[eλ(X−EX)]≤eλ2(b−a)2/8, i.e.,X is(b−a)2/4-subgaussian.

Proof. We can assume without loss of generality that EX = 0. In this case we haveψ(λ) = logE[eλX], and we can readily compute

ψ0(λ) = E[XeλX]

E[eλX] , ψ00(λ) =E[X2eλX] E[eλX] −

E[XeλX] E[eλX]

2 .

Thusψ00(λ) can be interpreted as the variance of the random variableXunder the twisted probability measuredQ=E[eeλXλX]dP. But then Lemma 2.1 yields ψ00(λ)≤(b−a)2/4, and the fundamental theorem of calculus yields

ψ(λ) = Z λ

0

Z µ 0

ψ00(ρ)dρ dµ≤λ2(b−a)2 8

usingψ(0) = log 1 = 0 andψ0(0) =EX= 0. ut

Problems

3.1 (Subgaussian variables).There are several different notions of random variables with a Gaussian tail that are all essentially equivalent up to con-stants. The aim of this problem is to obtain some insight into these notions.

3.1 Subgaussian variables and Chernoff bounds 49 a. As a warmup exercise, show that ifX isσ2-subgaussian, then Var[X]≤σ2. b. Show that for any increasing and differentiable functionΦ

E[Φ(|X|)] =Φ(0) + Z

0

Φ0(t)P[|X| ≥t]dt.

This elementary identity will be needed below.

In the following, we will assume for simplicity thatEX = 0. We now prove that the following three properties are equivalent for suitable constantsσ, b, c:

(1)X isσ2-subgaussian; (2)P[|X| ≥t]≤2e−bt2; and (3)E[ecX2]≤2.

c. Show that ifX isσ2-subgaussian, thenP[|X| ≥t]≤2e−t2/2σ2. d. Show that ifP[|X| ≥t]≤2e−t2/2σ2, thenE[eX2/6σ2]≤2.

Hint: use part b.

e. Show that ifE[eX2/6σ2]≤2, thenX is 18σ2-subgaussian.

Hint: for large values ofλ, use Young’s inequality|λX| ≤ 22 +X2a2 for a suitable choice ofa; for small values of λ, use Young’s inequality together withE[eλX]≤1 +λ22E[X2e|λX|] by Taylor’s theorem.

In addition, the subgaussian property ofX is equivalent to the fact that the moments ofX scale as is the case for the Gaussian distribution.

f. Show that ifX isσ2-subgaussian, thenE[X2q]≤(4σ2)qq! for allq∈N. Hint: use part b.

g. Show that ifE[X2q]≤(4σ2)qq! for allq∈N, thenE[eX2/8σ2]≤2.

Hint: expand in a power series.

Note: the numerical constants in this problem are not intended to be sharp.

3.2 (Tightness of Hoeffding’s lemma). Show that the bound of Hoeffd-ing’s lemma is the best possible by consideringP[X=a] =P[X =b] = 12. 3.3 (Chernoff bound vs. moments). Show that fort≥0

P[X−EX≥t]≤inf

p≥0

E[(X−EX)p+] tp ≤ inf

λ≥0e−λtE[eλ(X−EX)].

Thus the moment bound of Remark 3.3 is at least as good as the Chernoff bound. However, the former is much harder to use than the latter.

Hint: useE[eλ(X−EX)]≥E[1X−EX>0eλ(X−EX)] and expand in a power series.

3.4 (Chernoff bound exercises).Compute the explicit form of the Chernoff bound for Poisson and Bernoulli random variables.

50 3 Subgaussian concentration and log-Sobolev inequalities

3.5 (Maxima of subgaussian variables).LetX1, X2, . . .be (not necessar-ily independent)σ2-subgaussian random variables. Show that

P

max

i≤n{Xi−EXi} ≥(1 +ε)σp 2 logn

n→∞

−−−−→0 for allε >0.

Hint: use the union bound

P[X∨Y ≥t] =P[X ≥torY ≥t]≤P[X≥t] +P[Y ≥t].

This problem shows that the maximum maxi≤n{Xi−EXi}ofσ2-subgaussian random variables is at most of orderσ√

2 logn. This is the simplest example of the crucial role played by tail bounds in estimating the size of maxima of random variables. The second part of this course will be entirely devoted to the investigation of such problems (using much deeper ideas).

Im Dokument Probability in High Dimension (Seite 52-56)