Probability Theory

(1)

Probability Theory

Wilhelm Stannat

Technische Universität Darmstadt Winter Term 2007/08

Second part - corrected version

This text is a summary of the lecture on Probability Theory held at the TU Darmstadt in Winter Term 2007/08.

Please email all misprints and mistakes to

stannat@mathematik.tu-darmstadt.de

(2)

Bibliography

1. Bauer, H.,Probability theory, de Gruyter, 1996.

2. Bauer, H.,Maß- und Integrationstheorie, de Gruyter, 1996.

3. Billingsley, P.,Probability and Measure, Wiley, 1995.

4. Billingsley, P.,Convergence of probability measures, Wiley, 1999.

5. Dudley, R.M.,Real analysis and probability, Cambridge University Press, 2002.

6. Elstrodt, J.,Maß- und Integrationstheorie, Springer, 2005.

7. Feller, W.,An introduction to probability theory and its applications, Vol. 1 & 2, Wiley, 1950.

8. Halmos, P.R.,Measure Theory, Springer, 1974.

9. Klenke, A.,Wahrscheinlichkeitstheorie, Springer, 2006.

10. Shiryaev, A.N.,Probability, Springer, 1996.

(3)

1 Basic Notions

9 Distribution of random variables

Let(Ω,A, P)be a probability space, andX : Ω→R¯ be a r.v.

Letµbe thedistribution ofX (underP), i.e.,µ(A) =P[X ∈A]for allA∈B( ¯R).

Assume that P[X ∈ R] = 1 (in particular, X P-a.s. finite, and µ is a probability measure on(R,B(R)).

Definition 9.1. The functionF :R→[0,1], defined by F(b) :=P[X6b] =µ ]− ∞, b]

, b∈R, (1.1)

is called thedistribution functionofX resp.µ.

Proposition 9.2. (i) F is monotone increasing: a6b ⇒ F(a)6F(b) right continuous: F(a) = lim

bցaF(b)

normalized: lim

aց−∞F(a) = 0, lim

bր+∞F(b) = 1.

(ii) To any such function there exists a unique probability measure µ on (R,B(R)) with(1.10).

Proof. (i) Monotonicity is obvious.

Right continuity: ifb ց a then]− ∞, b]ց ]− ∞, a], hence by continuity ofµ from above (vgl. Proposition 1.9):

F(a) =µ ]− ∞, a]1.9

= lim

bցaµ ]− ∞, b]

= lim

bցaF(b).

Similarly,]− ∞, a]ց ∅ifaց −∞(resp.]− ∞, b]րRifbր ∞), and thus

aց−∞lim F(a) = lim

aց−∞µ ]− ∞, a]

= 0

(resp. lim

bր∞F(b) = lim

bր∞µ ]− ∞, b]

= 1).

(ii) Existence: Letλbe the Lebesgue measure on]0,1[. Define the "inverse function"

GofF :R→[0,1]by G: ]0,1[→R G(y) := inf

x∈R

F(x)> y .

(4)

Note thaty < F(x) ⇒ G(y)6ximplies 0, F(x)

⊂ {G6x}

andG(y)6x ⇒ ∃xn ցxwithF(xn)> y, henceF(x)>y, so that {G6x} ⊂

0, F(x) .

Combining both inclusions we obtain that 0, F(x)

⊂ {G6x} ⊂

0, F(x) . so thatGis measurable.

Letµ:=G(λ) =λ◦G⁻¹ (probability measure on(R,B(R))). Then µ ]− ∞, x]

=λ({G6x}) =λ

0, F(x)

=F(x) ∀x∈R. Uniqueness: later.

Remark 9.3. (i) LetY be a r.v. with uniform distribution on[0,1], thenX =G(Y) has distributionµ. In particular: simulating the uniform distribution on[0,1]gives by transformation withGa simulation ofµ.

(ii) Some authors define the distribution functionF by F(x) :=µ ]− ∞, x[

. In this caseF is left continuous, not right continuous.

Remark 9.4. (i) LetF be a distribution function and letx∈R: Then F(x)−F(x−) = lim

nր∞µi x−1

n, xi

=µ({x}) is called the step height ofF inx. In particular:

F continuous ⇔ ∀x∈R:µ({x}) = 0 “µis continuous”.

(ii) LetF be monotone increasing and bounded, thenF has at most countable many points of discontinuity.

Definition 9.5. (i) F (resp.µ) is calleddiscrete, if there exists a countable setS⊂R withµ(S) = 1. In this case, µis uniquenely determined by the weights µ({x}), x∈S, andF is a step functionof the following type:

F(x) = X

y∈S, y6x

µ({y}).

(ii) F (resp. µ) is calledabsolutely continuous, if there exists a measurbale function f >0(called the "density"), such that

F(x) = Z x

−∞

f(t) dt, (1.2)

(5)

resp., for allA∈B(R):

µ(A) = Z

A

f(t) dt= Z _∞

−∞

1A·f dt. (1.3)

In particular Z +∞

−∞

f(t) dt= 1.

Remark 9.6. (i) Every measurable function f >0 withR+∞

−∞ f(t) dt= 1 defines a probability measure on(R,B(R))byA7→R

Af(t) dt.

(ii) In the previous definition "(1.11)⇒(1.12)", because A 7→ R

Af(t) dt defines a probability measure on(R,B(R))with distribution functionF. Uniqueness in 9.2 implies the assertion.

Example 9.7. (i) Uniform distribution on [a, b]. Let f := _b₋¹_a ·1[a,b]. The associated distribution function is given by

F(x) :=







0 ifx6a

1

b−a·(x−a) ifx∈[a, b]

1 ifx>b.

(continuous analogue to the dicrete uniform distribution on a finite set) (ii) Exponential distribution with parameterα >0.

f(x) :=

(αe⁻^αx ifx>0 0 ifx <0, F(x) :=

(1−e⁻^αx ifx>0 0 ifx <0.

f α

(continuous analogue of the geometric distribution) Z k+1

k

f(x) dx=F(k+ 1)−F(k) =e⁻^αk(1−e⁻^α) = (1−p)^kpwithp= 1−e⁻^α. (iii) Normal distributionN(m, σ²),m∈R,σ²>0

fm,σ²(x) = 1

√2πσ² ·e⁻^(x−m)2^2σ² .

(6)

The associated distribution function is given by Fm,σ²(x) = 1

√2πσ² · Z x

−∞

e⁻^(y⁻^2σ^m)2² dy

z=^y⁻_σ^m

= 1

√2π· Z ^x−m_σ

−∞

e⁻^z

2

2 dz=F0,1

x−m σ

.

Φ := F0,1 is called the distribution function of the standard normal distribution N(0,1).

ϕ

√1 2π

The expectationE[X](or more generalE[h(X)]) can be calculated with the help of the distributionµofX:

Proposition 9.8. Leth>0be measurable, then E

h(X)

= Z +∞

−∞

h(x)µ(dx)

=









 Z +∞

−∞

h(x)·f(x) dx ifµabsolutely continuous with densityf X

x∈S

h(x)·µ({x}) ifµdiscrete,µ(S) = 1andS countable.

Proof. See exercises.

Example 9.9. LetX beN(m, σ²)-distributed. Then E[X] =

Z

x·f_m,σ²(x) dx=m+ Z

(x−m)·f_m,σ²(x) dx

| {z }

=0

=m .

Thep^thcentral moment ofX is given by E

|X−m|^p

= Z

|x−m|^p·f_m,σ²(x) dx,

= Z

|x|^p·f0,σ²(x) dx.

= 2 Z _∞

0

x^p· 1

√2πσ² ·e⁻^x

2 2σ2 dx,

|{z}=

y=_2σ^x²₂

√1π·2^p² ·σ^p Z _∞

0

y^p+1² ⁻¹·e^−ydy.

| {z }

=Γ(^p+1₂ )

(7)

In particular:

p= 1 :E

|X−m|

=σ· r2

π p= 2 :E

|X−m|²

=σ² p= 3 :E

|X−m|³

= 2³² · σ³

√π p= 4 :E

|X−m|⁴

= 3σ⁴.

10 Weak convergence of probability measures

LetS be a topological space andSbe the Borelσ-algebra onS.

Letµ,µn,n∈N, be probability measures on(S,S).

What is a reasonable notion of convergence of the sequence µn towards µ? The notion of "pointwise convergence" in the sense thatµn(A)−−−−→ⁿ^→∞ µ(A)for allA∈S is too strong for many applications.

Definition 10.1.Letµandµn,n∈N, be probability measures on(S,S). The sequence (µn)converges toµweakly if for allf ∈Cb(S) (= the space of bounded continuous functions onS) it follows that

Z f dµn

n→∞

−−−−→

Z f dµ.

Example 10.2. (i) xn n→∞

−−−−→xin S impliesδxn

−−−−→n→∞ δx weakly.

(ii) LetS:=R¹ andµn:=N 0,¹_n

. Thenµn→δ0weakly, since for allf ∈C_b(R) Z

f dµn= Z

f(x)· 1 q

2π_n¹ ·e⁻

x2 2·1

n dx

x=√^yn

= Z

f y

√n · 1

√2π·e⁻^y

2 2 dy

Lebesgue

−−−−→n→∞ f(0) = Z

f dδ0.

Proposition 10.3 (Portemanteau-Theorem). LetS be a metric space with metric d.

Then the following statements are equivalent:

(i) µn→µweakly (ii) R

f dµn n→∞

−−−−→R

f dµfor allf bounded and uniformly continuous (w.r.t. d) (iii) lim sup_n→∞µn(F)6µ(F)for allF ⊂S closed

(iv) lim infn→∞µn(G)>µ(G)for allG⊂S open

(8)

(v) limn→∞µn(A) =µ(A)for allA∈Swithµ( ¯A\A) = 0.˚ Proof. (iii)⇔(iv): Obvious by considering the complement.

(i)⇒(ii): Trivial.

(ii)⇒(iii): LetF ⊂S be closed, let Gm:=

x∈S

d(x, F)< 1 m

, m∈N open!

ThenGmցF, hence µ(Gm)ցµ(F).

Ifε >0there exists somem∈Nmitµ(Gm)< µ(F) +ε. Define

ϕ(x) :=







1 ifx60 1−x ifx∈[0,1]

0 ifx>1.

ϕ 1 0 and letf :=ϕ m·d(·, F)

.

f is Lipschitz, in particular uniformly continuous,f = 0 onG^c_mandf = 1onF, and thus

lim sup

n→∞

µn(F)6lim sup

n→∞

Z f dµn

(ii)= Z

f dµ 6µ(Gm)< µ(F) +ε.

(iii)⇒(v): LetAbe such that µ( ¯A\A) = 0. Then˚ µ(A) =µ( ˚A)^(iv)6 lim inf

n→∞ µn( ˚A)6lim inf

n→∞ µn(A)6lim sup

n→∞ µn(A) 6lim sup

n→∞ µn( ¯A)

(iii)

6 µ( ¯A) =µ(A).

(v)⇒(iii): LetF ⊂S be closed. For allδ >0we have that

∂

d(·, F)>δ ⊂

d(·, F) =δ . Note The set

D:=n δ >0

µ

d(·, F) =δ >0o

(9)

is countable, since for allnthe set Dn:=

δ >0

µ

d(·, F) =δ

| {z }

disjoint!

> 1 n

is finite for any n ∈ N. In particular, there exists a sequence δk ∈]0,∞[\D, δk↓0such that the set

Fk:=

d(·, F)6δk

satisfiesµ( ¯Fk\F˚k) = 0. Fk ցF now implies that lim sup

n→∞ µn(F)6lim sup

n→∞ µn(Fk)^(v)= µ(Fk)−−−−→^k^→∞ µ(F).

(iii)⇒(i): Letf ∈Cb(S). It suffices to prove that lim sup

n→∞

Z

f dµn 6 Z

f dµ, (since then

−lim inf Z

f dµn6 Z

(−f) dµ, hencelim infR

f dµn>R f dµ) W.l.o.g. 06f 61

Fixk∈Nand letFj:=

f >_k^j

,j∈N(Fj closed! ) Then

1 k

Xk i=1

1Fi6f 6 1 k+1

k Xk

i=1

1Fi

Hence for all probability measuresν on(S,S):

1 k

Xk i=1

ν(Fi)6

†

Z

f dν6

‡

1 k+1

k Xk i=1

ν(Fi).

and

lim sup

n→∞

Z

f dµn−1 k

(‡)

6 1

k·lim sup

n→∞

Xk i=1

µn(Fi)

6 1 k

Xk i=1

lim sup

n→∞ µn(Fi)⁽ⁱⁱⁱ⁾6 1 k

Xk i=1

µ(Fi)⁽6^†⁾ Z

f dµ

(10)

Corollary 10.4. LetX,Xn,n∈N, be measurable mappings from(Ω,A, P)to(S,S) with distributionsµ,µn, n∈N. Then:

Xn n→∞

−−−−→X in probability ⇒ µn n→∞

−−−−→µ weakly

Here,limn→∞Xn =X in probability, iflimn→∞P(d(X, Xn)> δ) = 0 for allδ >0.

Proof. Let f ∈ Cb(S) be uniformly continuous and ε > 0. Then there exists a δ = δ(ε)>0such that:

x, y∈S withd(x, y)6δimplies|f(x)−f(y)|< ε Hence

Z

f dµ− Z

f dµn

=

E f(X)

−E

f(Xn) 6

Z

{d(X,Xn)6δ}

f(X)−f(Xn) dP+

Z

{d(X,Xn)>δ}

f(X)−f(Xn) dP 6ε+ 2kfk∞·P

d(Xn, X)> δ

| {z }

−−−−→n→∞ ⁰

.

Corollary 10.5. LetS=R¹and letµ,µn,n∈N, be probability measures on(R,B(R)) with distributions functionsF,Fn. Then the following statements are equivalent:

(i) µn n→∞

−−−−→ µvaguely, i.e. limn→∞R

f dµn =R

f dµ for allf ∈C₀(R¹)(= the space of continuous functions with compact support)

(ii) µn n→∞

−−−−→µweakly

(iii) Fn(x)−−−−→^n→∞ F(x)for allxwhere F is continuous.

(iv) µn ]a, b] n→∞

−−−−→µ ]a, b]

for all]a, b]withµ({a}) =µ({b}) = 0.

Proof. (i)⇒(ii): Exercise.

(ii)⇒(iii): Let xbe such thatF is continuous in x. Thenµ({x}) = 0, which implies by the Portmanteau theorem:

Fn(x) =µn ]− ∞, x] n→∞

−−−−→µ ]− ∞, x]

=F(x).

(iii)⇒(iv): Let]a, b]be such thatµ({a}) =µ({b}) = 0thenF is continuous inaand b and thus

µ ]a, b]

=F(b)−F(a)⁽ⁱⁱⁱ⁾= lim

n→∞Fn(b)− lim

n→∞Fn(a)

= lim

n→∞µn ]a, b]

.

(11)

(iv)⇒(i): Let D :=

x∈R

µ({x}) = 0 . ThenR\D is countable, hence D ⊂R dense. Letf ∈C0(R), thenf is uniformly continuous, hence forε >0 we find c0<· · ·< cm∈Dsuch that

f−

Xm k=1

f(ck−1)·I_]c_k₋₁_,c_k_]

| {z }

=:g

∞6sup

k

sup

x∈[c_k−1,ck]

f(x)−f(ck−1) < ε.

Then Z

f dµ− Z

f dµn

6

Z

|f −g|dµ

| {z }

<ε

+ Z

gdµ− Z

gdµn

+

Z

|f−g|dµn

| {z }

<ε

62ε+ Xm k=1

f(c_k−1)·

µ ]c_k−1, ck]

−µn ]c_k−1, ck]

(iv) n→∞

−−−−→2ε.

11 Dynkin-systems and Uniqueness of probability measures

LetΩ6=∅.

Definition 11.1. A collection of subsets D⊂P(Ω)is called aDynkin-system, if:

(i) Ω∈D.

(ii) A∈D ⇒ A^c∈D.

(iii) Ai∈D,i∈N, pairwise disjoint, then [

i∈N

Ai∈D.

Example 11.2. (i) Everyσ-AlgebraA⊂P(Ω) is a Dynkin-system (ii) LetP1, P2 be probability measures on(Ω,A). Then

D:=

A∈A

P1(A) =P2(A) is a Dynkin-system

Remark 11.3. (i) LetDbe a Dynkin-system. Then

A, B∈D, A⊂B ⇒ B\A= (B^c∪A)^c∈D

(12)

(ii) Every Dynkin-system which is closed under finite unions (short notation:∩-stable), is aσ-algebra, because:

(a) A, B ∈D ⇒ A∪B=A∪ B\(A∩B)

| {z }

∈D

| {z }

(i)

∈D

∈D.

(b) Ai∈D, i∈N ⇒ [

i∈N

Ai=[

i∈N

Ai∩ⁱ⁻[¹

n=1

An

| {z }

(a)

∈D

c

| {z }

∈Dby ass., pairwise disjoint

∈D.

Proposition 11.4. Let B⊂P(Ω)be a∩-stable collection of subsets. Then σ(B) =D(B),

where

D(B) := \

DDynkin-system B⊂D

D

is called the Dynkin-system generated byB. Proof. See text books on measure theory.

Proposition 11.5(Uniqueness of probability measures). LetP1, P2be probability measures on(Ω,A), andB⊂Abe a∩-stable collection of subsets. Then:

P1(A) =P2(A) for all A∈B ⇒ P1=P2 onσ(B).

Proof. The collection of subsets D:=

A∈A

P1(A) =P2(A)

is a Dynkin-system containingB. Consequently, σ(B)^11.4= D(B)⊂D.

Example 11.6. (i) Forp∈]0,1[the probability measurePp on(Ω :={0,1}^N,A)is uniquely determined by

Pp[X1=x1, . . . , Xn=xn] =p^k(1−p)ⁿ⁻^k, withk:=

Xn i=1

xi

for allx1, . . . , xn∈ {0,1},n∈N, because the collection of cylindrical sets {X1=x1, . . . , Xn=xn}, n∈N₀, x1, . . . , xn∈ {0,1}

is∩-stable, generatingA(cf. Example 1.7).

(Existence ofPp forp=¹₂ see Example 3.6. Existence forp∈]0,1[\ {¹2} later.)

(13)

(ii) A probability measure on(R,B(R))is uniquely determined through its distribution functionF (:=µ ]− ∞,·]

), because µ ]a, b]

=F(b)−F(a),

and the collection of intervals]a, b],a, b∈R, is∩-stable, generating B(R).

(14)

(15)

2 Independence

1 Independent events

Let(Ω,A, P)be a probability space.

Definition 1.1. A collection of eventsAi∈A,i∈I, are said to beindependent(w.r.t.

P), if for any finite subsetJ ⊂I P \

j∈J

Aj

=Y

j∈J

P(Aj).

A family of collection of subsetsB_i ⊂A, i∈I, is said to beindependent, if for all finite subsetsJ⊂I and for all subsetsAj ∈B_j,j∈J

P \

j∈J

Aj

=Y

j∈J

P(Aj).

Proposition 1.2. LetB_i,i∈I, be independent and closed under intersections. Then:

(i) σ(B_i),i∈I, are independent.

(ii) LetJk,k∈K, be a partition of the index setI. Then theσ-algebras σ [

i∈Jk

B_i

, k∈K,

are independent.

Proof. (i) Let J ⊂ I, J finite, be of the form J = {j1, . . . , jn}. Let Aj1 ∈ σ(B_j

1), . . . , Ajn∈σ(B_j

n).

We have to show that

P(Aj1∩ · · · ∩Ajn) =P(Aj1)· · ·P(Ajn). (2.1) To this end suppose first thatAj2 ∈B_j

2, . . . , Ajn∈B_j

n, and define D_j

1 :=

A∈σ(B_j

1)

P(A∩Aj2∩ · · · ∩Ajn)

=P(A)·P(Aj2)· · ·P(Ajn) . ThenD_j

1 is a Dynkin system (!) containingB_j

1. Proposition 1.11.4 now implies σ(B_j

1) =D(B_j

1)⊂D_j

1 , henceσ(B_j

1) =D_j

1. Iterating the above argument forD_j

2,D_j

3, implies (2.1).

(16)

(ii) Fork∈Kdefine C_k :=n \

j∈J

Aj

J ⊂Jk, J finite, Aj ∈B_jo .

Then C_k is closed under intersections and the collection of subsets C_k, k ∈ K, are still independent, because: given k1, . . . , kn ∈ K and finite subsets J¹ ⊂ Jk1, . . . , Jⁿ⊂Jkn, then

P \

i∈J¹

Ai

| {z }

∈C_k

1

∩ · · · ∩ \

i∈Jⁿ

Ai

| {z }

∈C_kn

^Bⁱ_ind.^,i^∈^I

= Yn j=1

P \

i∈J^j

Ai

.

(i) now implies that σ(C_k) =σ [

i∈Jk

B_i

, k∈K,

are independent too.

Example 1.3. LetAi ∈A,i∈I, be independent. ThenAi, A^c_i,i∈I, are independent too.

Remark 1.4. Pairwise independence does not imply independence in general.

Beispiel: Consider two tosses with a fair coin, i.e.

Ω :=

(i, k)

i, k∈ {0,1} , P :=uniform distribution.

Consider the events A:="1. toss1"=

(1,0),(1,1) B:="2. toss1"=

(0,1),(1,1) C:="1. and 2. toss equal"=

(0,0),(1,1) .

ThenP(A) =P(B) =P(C) = ¹₂ andA, B, C are pairwise independent P(A∩B) =P(B∩C) =P(C∩A) = 1

4. But on the other hand

P(A∩B∩C) = 146=P(A)·P(B)·P(C).

Example 1.5. Independent 0-1-experiments with success probability p ∈ [0,1].

LetΩ :={0,1}^N,Xi(ω) :=xi andω:= (xi)i∈N. LetPp be a probability measure on A:=σ {Xi= 1}, i= 1,2, . . .

, with

(i) Pp[Xi= 1] =p(hencePp[Xi= 0] =Pp {Xi= 1}^c

= 1−p).

(17)

(ii) {Xi= 1},i∈N, areindependent w.r.t. Pp.

Existence of such a probability measure later! Then for anyx1, . . . , xn∈ {0,1}:

Pp[Xi1 =x1, . . . , Xin=xn]

(ii) and 1.3=

Yn j=1

Pp[Xij =xj]⁽ⁱ⁾=p^k(1−p)^n−k,

wherek:=Pn

i=1xi gilt. HencePp is uniquely determined by (i) and (ii).

Proposition 1.6 (Kolmogorov’s Zero-One Law). Let B_n, n ∈ N, be independentσ- algebras, and

B_∞:=

\∞ n=1

σ[^∞

m=n

B_m

be thetail-field(resp. σ-algebra of terminal events). Then P(A)∈ {0,1} ∀A∈B_∞

i.e.,P is deterministic onB_∞.

Illustration: Independent 0-1-experiments LetB_i=σ {Xi= 1}

. Then B_∞= \

n∈N

σ [

m>n

B_m

is theσ-algebra containing the events of the remote future, e.g.

lim sup

i→∞ {Xi= 1}={“infinitely many ‘1” ’}

ω∈ {0,1}^N lim

n→∞

1 n

Xn i=1

Xi(ω)

| {z }

=:^Sn(ω)_n

exists

Proof of the Zero-One Law. Proposition 1.2 implies that for alln B₁,B₂, . . . ,B_n₋₁, σ [^∞

m=n

B_m

are independent. SinceB_∞⊂σS

m>nB_m

, this implies that for alln B₁,B₂, . . . ,B_n₋₁,B_∞

are independent. By definition this implies that B∞,B_n, n∈Nare independent

(18)

and now Proposition 1.2(ii) implies that σ [

n∈N

B_n

und B_∞

are idependent. Since B_∞ ⊂ σS

n>1B_n

we finally obtain that B_∞ and B_∞ are independent. The conclusion now follows from the next lemma.

Lemma 1.7. LetB⊂Abe aσ-algebra such thatBis independent fromB. Then P(A)∈ {0,1} ∀A∈B.

Proof. For allA∈B

P(A) =P(A∩A) =P(A)·P(A) =P(A)². HenceP(A) = 0orP(A) = 1.

For any sequenceAn, n ∈N, of independent events inA, Kolmogorov’s Zero-One Law implies in particular for

A_∞:= \

n∈N

[

m>n

Am =: lim sup

n→∞ An

thatP(A_∞) = 0−1.

Proof: The σ-algebras B_n := σ{An} = {∅,Ω, A, A^c}, n ∈ N, are independent by Proposition 1.2 andA_∞∈B_∞.

Lemma 1.8 (Borel-Cantelli). (i) LetAi ∈A,i∈N. Then X∞

i=1

P(Ai)<∞ ⇒ P lim sup

i→∞ Ai

= 0.

(ii) Assume thatAi ∈A,i∈N, are independent. Then X∞

i=1

P(Ai) =∞ ⇒ P lim sup

i→∞

Ai

= 1.

Proof. (i) See Lemma 1.1.11.

(ii) It suffices to show that P[^∞

m=n

Am

= 1 resp. P\^∞

m=n

A^c_m

= 0 ∀n .

(19)

The last equality follows from the fact that P\^∞

m=n

A^c_m

= lim

k→∞ P^n+k\

m=n

A^c_m

| {z }

=Qn+k

m=nP(A^c_m) ind.

=

n+kY

m=n

(1−P(Am))≤exp

n+kX

m=n

P(Am)

!

= 0

where we used the inequality1−α6e^−α for allα∈R.

Example 1.9. Independent 0-1-experiments with success probability p∈]0,1[.

Let(x1, . . . , xN)∈ {0,1}^N ("binary text of lengthN").

Pp["text occurs"] ?

To calculate this probability we partition the infinite sequenceω= (yn)∈ {0,1}^Ninto blocks of lengthN

(y1, y2, . . .

| {z }

1. block length=N

. . . .

| {z }

2. block length=N

. . .)∈Ω :={0,1}^N.

and consider the events Ai ="text occurs in thei^thblock". Clearly, Ai, i ∈N, are independent events (!) by Proposition 1.2(ii) with equal probability

Pp(Ai) =p^K(1−p)^N⁻^K =:α >0.

whereK:=PN

i=1xiis the total sum of ones. In particular,P_∞

i=1Pp(Ai) =P_∞

i=1α=

∞, and now Borel-Cantelli impliesPp(A_∞) = 1, where A_∞= lim sup

i→∞ Ai:="text occurs infinitely many times".

Moreover: since the indicator functions1A1,1A2, . . . are uncorrelated (since they are independent r.v. (see below)), the strong law of large numbers implies that

1 n

Xn i=1

1Ai

Pp-a.s.

−−−−→E[1Ai] =α ,

i.e. the relative frequency of the given text in the infinite sequence is strictly positive.

2 Independent random variables

Let(Ω,A, P)be a probability space.

(20)

Definition 2.1. A familyXi, i∈I, of r.v. on(Ω,A, P)is said to be independent, if theσ-algebras

σ(Xi) :=X_i⁻¹ B( ¯R)

=

{Xi∈A}

A∈B( ¯R) , i∈I,

are independent, i.e. for all finite subsetsJ ⊂I and any Borel subsetsAj∈B( ¯R) P \

j∈J

{Xj∈Aj}

=Y

j∈J

P[Xj ∈Aj].

Remark 2.2. Let Xi, i ∈ I, be independent and hi : ¯R → R¯, i ∈ I, B( ¯R)/B( ¯R)- measurable. ThenYi:=hi(Xi),i∈I, are again independent, becauseσ(Yi)⊂σ(Xi) for alli∈I.

Proposition 2.3. LetX1, . . . , Xn be independent r.v.,≥0. Then E[X1· · ·Xn] =E[X1]· · ·E[Xn].

Proof. W.l.o.g. n = 2. (Proof of the general case by induction, using the fact that X1·. . .·Xn−1 and Xn are independent , since X1·. . .·Xn−1 is measurable w.r.t σ σ(X1)∪ · · · ∪σ(Xn−1)

andσ σ(X1)∪ · · · ∪σ(Xn−1)

andσ(Xn)are independent by Proposition 1.2.)

It therefore suffices to consider two independent r.v. X, Y,≥0, and we have to show that

E[XY] =E[X]·E[Y]. (2.2)

W.l.o.g.X, Y simple

(for generalX andY there exist increasing sequences of simple r.v. Xn (resp. Yn), which areσ(X)-measurable (resp.σ(Y)-measurable), converging pointwise toX (resp.

Y).

ThenE[XnYn] =E[Xn]·E[Yn]for allnimplies (2.2) using monotone integration.) But forX,Y simple, hence

X = Xm i=1

αi1Ai and Y = Xn j=1

βj1Bj,

withαi, βj >0 andAi∈σ(X)resp. Bj∈σ(Y)it follows that E[XY] =X

i,j

αiβj·P(Ai∩Bj) =X

i,j

αiβj·P(Ai)·P(Bj) =E[X]·E[Y].

Corollary 2.4. X, Y independent,X, Y ∈L¹

⇒ XY ∈L¹ and E[XY] =E[X]·E[Y].

(21)

Proof. Let ε1, ε2 ∈ {+,−}. Then X^ε¹ and Y^ε² are independent by Remark 2.2 and nonnegative. Proposition 2.3 implies

E[X^ε¹·Y^ε²] =E[X^ε¹]·E[Y^ε²].

In particularX^ε¹·Y^ε² inL¹, becauseE[X^ε¹]·E[Y^ε²]<∞. Hence X·Y =X⁺·Y⁺+X⁻·Y⁻−(X⁺·Y⁻+X⁻·Y⁺) ∈L¹, andE[XY] =E[X]·E[Y].

Remark 2.5. (i) In general the converse to the above corollary does not hold: For example let X be N(0,1)-distributed and Y = X². Then X and Y are not independent, but

E[XY] =E[X³] =E[X]·E[Y] = 0. (ii)

X, Y ∈L² independent⇒X, Y uncorelated because

cov(X, Y) =E[XY]−E[X]·E[Y] = 0.

Corollary 2.6(to the strong law of large numbers ). LetX1, X2,· · · ∈L² be independent withsup_i_∈Nvar(Xi)<∞. Then

nlim→∞

1 n

Xn i=1

Xi(ω)−E[Xi]

= 0 P-a.s.

IfE[Xi]≡mthen lim

n→∞

1 n

Xn i=1

Xi(ω) =m P-a.s.

3 Kolmogorov’s law of large numbers

Proposition 3.1(Kolmogorov, 1930). LetX1, X2,· · · ∈L¹be independent, identically distributed,m=E[Xi]. Then

1 n

Xn i=1

Xi(ω)

| {z }

empirical mean

−−−−→n→∞ m P-a.s.

Proposition 3.1 follows from the following more general result:

Proposition 3.2 (Etemadi, 1981). Let X1, X2,· · · ∈ L¹ be pairwise independent, identically distributed,m=E[Xi]. Then

1 n

Xn i=1

Xi(ω)−−−−→ⁿ^→∞ m P-a.s.

(22)

Proof. W.l.o.g.Xi>0

(otherwise considerX₁⁺, X₂⁺, . . . (pairwise independent, identically distributed) and X₁⁻, X₂⁻, . . . (pairwise independent, identically distributed)) 1. ReplaceXi byX˜i:= 1_{X_i_<i}Xi.

Clearly,

X˜i=hi(Xi) with hi(x) :=

(x ifx < i 0 ifx>i

Then X˜1,X˜2, . . . are pairwise independent by Remark 2.2. For the proof it is now sufficient to show that forS˜n:=Pn

i=1X˜iwe have that S˜n

n

n→∞

−−−−→m P-a.s.

Indeed, X∞ n=1

P[Xn 6= ˜Xn] = X∞ n=1

P[Xn >n] = X∞ n=1

P[X1>n]

= X∞ n=1

X∞ k=n

P

X1∈[k, k+ 1[

= X∞ k=1

k·P

X1∈[k, k+ 1[

= X∞ k=1

E

k·1_{X1∈[k,k+1[}

| {z }

6X1·1_{X1∈[k,k+1[}

6E[X1]<∞

implies by the Borel-Cantelli lemma P[Xn 6= ˜Xn infinitely often] = 0.

2. Reduce the proof to convergence along the subsequence kn = ⌊αⁿ⌋ (= largest natural number≤αⁿ),α >1.

We will show in Step 3. that S˜kn−E[ ˜Skn]

kn

−−−−→n→∞ 0 P-a.s. (2.3)

This will imply the assertion of the Proposition, because E[ ˜Xi] =E

1_{X_i_<i}·Xi

=E

1_{X₁_<i}·X1

^i→∞

ր E[X1](=m) hence

1

kn ·E[ ˜Skn] = 1 kn

kn

X

i=1

E[ ˜Xi]−−−−→^n→∞ m,

(23)

and thus 1 kn ·S˜kn

n→∞

−−−−→m P-a.s.

Ifl∈N∩[kn, kn+1[, then kn

kn+1

| {z }

n→∞

−−−−→α¹

· S˜kn

kn

|{z}

n→∞

−−−−→_P-a.s.^m 6S˜l

l 6 S˜kn+1

kn+1

| {z }

n→∞

−−−−→_P-a.s.^m

· kn+1

kn

| {z }

n→∞

−−−−→^α .

Hence there exists aP-null setNα∈A, such that for allω /∈Nα

1

α·m6lim inf

l→∞

S˜l(ω)

l 6lim sup

l→∞

S˜l(ω)

l 6α·m.

Finally choose a subsequenceαnց1. Then for allω /∈N :=S

n>1Nαn

l→∞lim S˜l(ω)

l =m.

3. Due to Lemma 1.7.7 it suffices for the proof of (2.3) to show that

∀ε >0 :

X∞ n=1

P

"

S˜kn−E[ ˜Skn] kn

>ε

#

<∞

(fast convergence in probability towards0)

Pairwise independence ofX˜i impliesX˜i pairwise uncorrelated, hence P

"

S˜kn−E[ ˜Skn] kn

>ε

#

6 1

k²_nε² ·var( ˜Skn) = 1 k_n²ε²

kn

X

i=1

var( ˜Xi)

6 1

k_n²ε²

kn

X

i=1

E ( ˜Xi)²

.

It therefore suffices to show that s:=

X∞ n=1

1 k_n²

kn

X

i=1

E

( ˜Xi)²

= X

(i,n)∈N², i6kn

1 k_n² ·E

( ˜Xi)²

<∞.

To this end note that s=

X∞ i=1

X

n:kn>i

1 k²_n

·E ( ˜Xi)²

.

(24)

We will show in the following that there exists a constantc such that X

n:kn>i

1 k_n² 6 c

i². (2.4)

This will then imply that s

(2.4)

6 c X∞ i=1

1 i² ·E

( ˜Xi)²

=c X∞ i=1

1 i²·E

1_{X1<i}·X1²

6 c X∞ i=1

1 i²

Xi l=1

l²·P

X1∈[l−1, l[

= c X∞

l=1

l²· X∞

i=l

1 i²

| {z }

62l⁻¹

·P

X1∈[l−1, l[

!

6 2c X∞ l=1

l·P

X1∈[l−1, l[

= 2c X∞ l=1

E

l·1_{X1∈[l−1,l[}

| {z }

6(X1+1)·1_{_X_1∈_[l₋_1,l[_}

6 2c· E[X1] + 1

<∞, where we used the fact that

X∞ i=l

1 i² 6 1

l² + X∞ i=l+1

1

(i−1)i = 1 l²+

X∞ i=l+1

1 i−1 −1

i

= 1 l² +1

l 62 l . It remains to show (2.4). To this end note that

⌊αⁿ⌋=kn6αⁿ< kn+ 1

⇒ kn> αⁿ−1^α>1> αⁿ−αⁿ⁻¹=

α−1 α

| {z }

=:cα

αⁿ.

Letni be the smallest natural number satisfyingkni =⌊αⁿⁱ⌋>i, hence αⁿⁱ >i, then

X

n:kn>i

1

k_n² 6c⁻_α² X

n>ni

1

α²ⁿ =c⁻_α²· 1

1−α⁻²·α⁻²ⁿⁱ 6 c⁻_α² 1−α⁻² · 1

i².

Corollary 3.3. Let X1, X2, . . . be pairwise independent, identically distributed (iid) withXi>0. Then

n→∞lim 1 n

Xn i=1

Xi(ω) =E[X1] ∈[0,∞]

P-a.s.

(25)

Proof. W.l.o.g.E[X1] =∞. Then _n¹Pn

i=1 Xi(ω)∧N n→∞

−−−−→E[X1∧N],P-a.s. for allN, hence

1 n

Xn i=1

Xi(ω)> 1 n

Xn i=1

Xi(ω)∧N n→∞

−−−−→E[X1∧N]^Nր^→∞E[X1] P-a.s.

Example 3.4. Growth in random mediaLetY1, Y2, . . . be i.i.d.,Yi>0, withm:=

E[Yi] (existence of such a sequence later!)

DefineX0= 1and inductivelyXn:=Xn−1·Yn

Clearly,Xn=Y1· · ·Yn andE[Xn] =E[Y1]· · ·E[Yn] =mⁿ, hence

E[Xn]→







+∞ ifm >1 exponential growth (supercritical) 1 ifm= 1 critical

0 ifm <1 exponential decay (subcritical) What will be the long-time behaviour ofXn(ω)?

Surprisingly, in the supercritical casem >1, one may observe thatlim_n→∞Xn= 0 with positive probability.

Explanation: Suppose thatlogYi∈L¹. Then 1

nlogXn = 1 n

Xn i=1

logYi n→∞

−−−−→E[logY1] =:α P-a.s.

and

α <0: ∃ε >0with α+ε <0, so thatXn(ω)6e^n(α+ε) ∀n>n0(ω), hence P-a.s.

exponential decay

α >0: ∃ε >0with α−ε >0, so thatXn(ω)>e^n(α⁻^ε) ∀n>n0(ω), hence P-a.s.

exponential growth Note that Jensen’s inequality

α=E[logY1] 6 logE[Y1]

| {z }

=m

,

and in general the inequality is strict, i.e. α < logm, so that it might happen that α <0 althoughm >1(!)

IllustrationAs a particular example let Yi:=

(₁

2(1 +c) with prob.¹₂

1

2 with prob.¹₂ , so thatE[Yi] = ¹₄(1 +c) +¹₄ =¹₂+¹₄c(supercritical ifc >2)

On the other hand E[logY1] = 1

2 ·

"

log 1

2(1 +c)

+ log1 2

#

= 1

2·log1 +c 4

c<3

< 0.

HenceXn n→∞

−−−−→0P-a.s. with exponential rate for c <3, whereas at the same time forc >2E[Xn] =mⁿր ∞with exponential rate.

(26)

Back to Kolmogorov’s law of large numbers:

LetX1, X2, . . .∈L¹i.i.d. withm:=E[Xi]. Then 1

n Xn i=1

Xi(ω)−−−−→ⁿ^→∞ E[X1] P-a.s.

Define the "random measure"

̺n(ω, A) := 1 n

Xn i=1

1A Xi(ω)

="relative frequency of the eventXi∈A"

Then

̺n(ω,·) = 1 n

Xn i=1

δ_X_i(ω)

is a probability measure on R,B(R)

for fixed ω and it is called the empirical distribution of the firstnobservations

Proposition 3.5. ForP-almost everyω∈Ω:

̺n(ω,·)−−−−→ⁿ^→∞ µ:=P◦X₁⁻¹ weakly.

Proof. Clearly, Kolmogorov’s law of large numbers implies that for anyx∈R Fn(ω, x) :=̺n ω,]− ∞, x]

= 1 n

Xn i=1

1]−∞,x] Xi(ω)

→E[1]−∞,x](X1)] =P[X1≤x] =µ ]− ∞, x]

=:F(x) P-a.s., hence for everyω /∈N(x)for someP-null setN(x).

Then N := [

r∈Q

N(r).

is aP-null set too, and for allx∈Rand alls, r∈Qwiths < x < randω /∈N: F(s) := lim

n→∞Fn(ω, s)6lim inf

n→∞ Fn(ω, x) 6lim sup

n→∞ Fn(ω, x)6 lim

n→∞Fn(ω, r) =F(r).

Hence, ifF is continuous atx, then for ω /∈N

n→∞lim Fn(ω, x) =F(x).

Now the assertion follows from the Portmanteau theorem.