2 Strong Law of Large Numbers

(1)

2 Strong Law of Large Numbers

Definition 1. (X_n)_n∈_N independent and identically distributed (i.i.d.) iff (X_n)_n∈_N is independent and∀n, k : Xn

=d Xk.

Throughout this section: (X_n)_n∈_N independent (but only i.i.d. if explicitly noted).

Consider

C =

(S_n)n∈N converges in R . By Remark 1.1, P(C)∈ {0,1}.

First we provide sufficient conditions forP(C) = 1 to hold.

Theorem 1 (Kolmogorov’s inequality). Assume that X_i ∈ L² and EX_i = 0 for alli. Then

Pn sup

1≤k≤n

|S_k| ≥εo

≤ 1

ε² ·Var(S_n).

Proof. Let 1≤k ≤n. We show that

∀B ∈σ({X₁, . . . , X_k}) : Z

B

S_k²dP ≤ Z

B

S_n²dP. (1)

LetB ∈σ({X₁, . . . , X_k}). We start with S_n² = (S_k+S_n−S_k)², which implies E 1_BS_n² = E 1_BS_k²+ 2 E[(1_BS_k)·(S_n−S_k)] + E 1_B(S_n−S_k)²

≥ E 1_BS_k²+ 2 E[(1_BS_k)·(S_n−S_k)].

Moreover, it follows easily from Theorem III.5.4 that 1_BS_k and S_n−S_k are independent. Hence Theorem III.5.6 yields

E[(1_BS_k)·(S_n−S_k)] = E(1_B·S_k)·E(S_n−S_k) = 0, and thereby

E(1_B·S_n²)≥E(1_B·S_k²).

This completes the proof of (1). For k ≤n, define A_k=n

|S_`|< ε ,∀l < k∧ |S_k| ≥εo .

Then A_k ∈ σ({X₁, . . . , X_k}), the A_k are disjoint and sup_k≤n|S_k| > ε iff one A_k happens; hence with the help of (1) we have

ε²·Pn sup

1≤k≤n

|S_k| ≥εo

= ε²·

n

X

k=1

P(A_k)≤

n

X

k=1

Z

Ak

S_k²dP

≤

n

X

k=1

Z

Ak

S_n²dP ≤ Z

Ω

S_n²dP

= Var(S_n).

(2)

Theorem 2. If Xn∈L² and E(Xn) = 0 for all n, and

∞

X

i=1

Var(X_i)<∞, then S_n converges a.s..

Proof. S_n converges iff it is Cauchy; hence, for M := inf

n∈N

sup

k∈N

|S_n+k−S_n|,

S_n converges iff M = 0. Fix n ∈N. Then M > ε implies that for one r ∈N we have sup_1≤k≤r|Sn+k−Sn|> ε. Hence,

P({M > ε})≤sup

r

Pn sup

1≤k≤r

|S_n+k−S_n|> εo , and Kolmogorov’s inequality yields

Pn sup

1≤k≤r

|S_n+k−S_n|> εo

≤ 1 ε² ·

n+r

X

i=n+1

Var(X_i)≤ 1 ε² ·

∞

X

i=n+1

Var(X_i).

Since n was arbitrary, we get P({M > ε}) = 0 for every ε >0, i.e., M = 0 a.s..

Example 1. Let (Y_n)n∈N be i.i.d. with EY_n= 0, EY_n² <∞, and letb_nsuch that 1/b²_n is summable. Then

X

n

Var(Yn/bn)<∞, hence P

nY_n/b_n converges.

In the sequel, 0< a_n↑ ∞. We now study convergence almost surely of (S_n/a_n)n∈N. Lemma 1 (Kronecker’s Lemma). Let (xn)n∈Nbe a sequence inR. Then ifP∞

i=1 xi

ai

converges, _a¹

n ·Pn

i=1x_i →0.

Proof. Consider N with the counting measure γ, and define f_n(i) := x_i

ai

· a_i an

·1i≤n.

Then f_n →0 pointwise, and since a_n is monotone,|f_n(i)| ≤ ^x_aⁱ

i, which is γ–integrable by assumption. Hence, by Lebesgue’s theorem,

1 an

·X

i≤n

x_i = Z

N

f_ndγ → 0.

(3)

Theorem 3 (Strong Law of Large Numbers, L² Case). If Xn ∈ L² for all n, and

∀n∈N: X_n∈L² ∧

∞

X

i=1

1

a²_i ·Var(X_i)<∞ (2) then

1 an

·

n

X

i=1

(X_i−E(X_i))^P−→^-a.s.0.

Proof. Put Y_n = 1/a_n·(X_n−E(X_n)). Then E(Y_n) = 0 and (Y_n)n∈N is independent.

Moreover,

∞

X

i=1

Var(Y_i) =

∞

X

i=1

1

a²_i ·Var(X_i)<∞.

ThusP∞

i=1Y_i convergesP-a.s. due to Theorem 2. Apply Lemma 1.

Remark 1. 1. Assume that the variances Var(X_n) are bounded and that ε > 0.

Then it follows (with a_n=n^1/2(logn)^1/2+ε) in particular that n^−1/2(logn)^−1/2−ε·hX

i≤n

X_i−EX

i≤n

X_ii_P_-a.s.

−→ 0.

This means that for the ‘cumulative effect’ P

i≤nX_i the deviation from mean

‘typically’ grows slower thann^1/2(logn)^1/2+ε. (This will be refined by the CLT.) The independence of the X_n is of course crucial for this; if X₁ = X₂ =· · ·, we have a growth rate of n.

2. If additionally X_n is an i.i.d. sequence withX₁ ∈L², we may choosea_n =nand derive that

1 n ·

n

X

i=1

X_i ^P−→^-a.s.E(X₁).

In fact, this conclusion already holds ifX₁ ∈L¹, see Theorem 4 below.

Example 2. Let (X_n)n∈N be i.i.d. withP_X₁ =p·δ₁+ (1−p)·δ−1. Due to the Strong Law of Large Numbers

1

n ·S_n^P−→^-a.s.2p−1.

Moreover, if p= 1/2, for everyε >0

√ 1

n·(logn)^1/2+ε ·S_n ^P−→^-a.s.0.

Precise description of the fluctuation of S_n(ω) for P-a.e. ω ∈ Ω: Law of the Iterated Logarithm.

Lemma 2. Let U_i, V_i, W ∈Z(Ω,A) such that

∞

X

i=1

P({U_i 6=V_i})<∞.

(4)

Then

1 n ·

n

X

i=1

U_i ^P−→^-a.s.W ⇔ 1 n ·

n

X

i=1

V_i ^P-a.s.−→ W.

Proof. The Borel-Cantelli Lemma implies P(limi→∞{U_i 6=V_i}) = 0.

Lemma 3. For X ∈Z₊(Ω,A) E(X)≤

∞

X

k=0

P({X > k})≤E(X) + 1.

(Cf. Corollary II.8.2.) Proof. We have

E(X) =

∞

X

k=1

Z

{k−1<X≤k}

X dP, and therefore

E(X)≤

∞

X

k=1

k·P({k−1< X ≤k}) =

∞

X

k=0

P({X > k})

as well as

E(X)≥

∞

X

k=1

(k−1)·P({k−1< X ≤k})≥

∞

X

k=0

P({X > k})−1.

Theorem 4 (Strong Law of Large Numbers, i.i.d. Case). Let (X_n)_n∈_N be i.i.d.

Then

∃Z ∈Z(Ω,A) : 1

n ·S_n^P−→^-a.s.Z ⇔ X₁ ∈L¹, in which case Z = E(X₁) P-a.s.

Proof. ‘⇒’: From the assumption we derive 1

n ·X_n = 1

n ·S_n−n−1 n · 1

n−1 ·Sn−1 P-a.s.

−→ 0.

Hence, for the independent events A_n={|X_n|> n} we have P( lim

n→∞A_n) = 0. The Borel-Cantelli Lemma implies

∞

X

n=1

P(A_n)

| {z }

=P(|X₁|>n)

<∞.

Use Lemma 3 to obtain E(|X₁|)<∞.

(5)

‘⇐’: Consider the truncated random variables Y_n =

(X_n if |X_n|< n 0 otherwise.

We will first show that

∞

X

i=1

1

i² ·Var(Y_i)<∞. (3)

To this end, observe that

Var(Yi)≤E(Y_i²) =

i

X

k=1

E[Y_i²·1[k−1,k[(|Yi|)]

=

i

X

k=1

E[X_i²·1[k−1,k[(|Xi|)]

≤

i

X

k=1

k²·P({k−1≤ |X₁|< k}).

Thus

∞

X

i=1

1

i² ·Var(Y_i)≤

∞

X

k=1

k²·P({k−1≤ |X₁|< k})·

∞

X

i=k

1 i²

≤2·

∞

X

k=1

k·P({k−1≤ |X1|< k})

≤2·(E(|X₁|) + 1)<∞,

cf. the proof of Lemma 3. (3) follows. Theorem 3 now asserts that 1

n ·

n

X

i=1

(Yi−E(Yi))^P−→^-a.s.0.

Furthermore,Y_n is easily seen to be uniformly integrable, and thus

n→∞lim E(Y_n) = E(X₁). (4)

Due to (4),

1 n ·

n

X

i=1

Y_i ^P−→^-a.s.E(X₁). Moreover,

∞

X

i=1

P({X_i 6=Y_i})<∞, (5)

since, by Lemma 3,

∞

X

i=1

P({X_i 6=Y_i}) =

∞

X

i=1

P({|X_i| ≥i})≤

∞

X

i=0

P({|X₁|> i})≤E(|X₁|) + 1<∞.

(6)

Finally, by Lemma 2 and (5) 1 n ·

n

X

i=1

Xi P-a.s.

−→ E(X1).

What happens if X_n is not integrable?

Theorem 5. Let (X_n)n∈N be i.i.d..

(i) If E(X₁⁻)<∞ ∧E(X₁⁺) =∞ then 1

n ·S_n^P−→ ∞.^-a.s.

(ii) If E(|X₁|) = ∞then

n→∞lim 1 n ·S_n

=∞ P-a.s.

Proof. (i) follows from Theorem 4, and (ii) is an application of the Borel-Cantelli Lemma, see G¨anssler, Stute (1977, p. 131).

Remark 2. Let (X_n)n∈N be i.i.d. withµ=P_X₁ and corresponding distribution function F = FX1. Suppose that µ is unknown, but observations X1(ω), . . . , Xn(ω) are available for ‘estimation of µ’.

Fixx∈R. Due to Theorem 4, we have

F_n(x, ω) := #{i≤n : Xi(ω)≤x}

n

P-a.s.

−→ F(x).

F_n(x, ω) is called the empirical distribution function F_n(·, ω); analogously, one can define the empirical distribution

µn(A, ω) := #{i≤n : Xi(ω)∈A}

n .

To be precise, we know about the empirical distribution function that

∀x∈R ∃A∈A: P(A) = 1∧

∀ω ∈A: lim

n→∞F_n(x, ω) =F(x) . Therefore

∃A ∈A: P(A) = 1∧

∀q∈Q ∀ω∈A: lim

n→∞F_n(q, ω) =F(q) , which easily implies

∃A∈A: P(A) = 1∧

∀ω ∈A: µ_n(·, ω)−→^w µ ,

see p. 63, and Theorem III.3.2. This result can be strengthened to the Glivenko- Cantelli Theorem

∃A ∈A: P(A) = 1∧

∀ω∈A: lim

n→∞sup

x∈R

|F_n(x, ω)−F(x)|= 0 ,

see Billingsley (1979, Theorem 20.6). (From¨Ubung9.2, this result immediately follows for continuousF.)