• Keine Ergebnisse gefunden

Probability Theory

N/A
N/A
Protected

Academic year: 2022

Aktie "Probability Theory"

Copied!
26
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Probability Theory

Wilhelm Stannat

Technische Universität Darmstadt Winter Term 2007/08

Second part - corrected version

This text is a summary of the lecture on Probability Theory held at the TU Darmstadt in Winter Term 2007/08.

Please email all misprints and mistakes to

stannat@mathematik.tu-darmstadt.de

(2)

Bibliography

1. Bauer, H.,Probability theory, de Gruyter, 1996.

2. Bauer, H.,Maß- und Integrationstheorie, de Gruyter, 1996.

3. Billingsley, P.,Probability and Measure, Wiley, 1995.

4. Billingsley, P.,Convergence of probability measures, Wiley, 1999.

5. Dudley, R.M.,Real analysis and probability, Cambridge University Press, 2002.

6. Elstrodt, J.,Maß- und Integrationstheorie, Springer, 2005.

7. Feller, W.,An introduction to probability theory and its applications, Vol. 1 & 2, Wiley, 1950.

8. Halmos, P.R.,Measure Theory, Springer, 1974.

9. Klenke, A.,Wahrscheinlichkeitstheorie, Springer, 2006.

10. Shiryaev, A.N.,Probability, Springer, 1996.

(3)

1 Basic Notions

9 Distribution of random variables

Let(Ω,A, P)be a probability space, andX : Ω→R¯ be a r.v.

Letµbe thedistribution ofX (underP), i.e.,µ(A) =P[X ∈A]for allA∈B( ¯R).

Assume that P[X ∈ R] = 1 (in particular, X P-a.s. finite, and µ is a probability measure on(R,B(R)).

Definition 9.1. The functionF :R→[0,1], defined by F(b) :=P[X6b] =µ ]− ∞, b]

, b∈R, (1.1)

is called thedistribution functionofX resp.µ.

Proposition 9.2. (i) F is monotone increasing: a6b ⇒ F(a)6F(b) right continuous: F(a) = lim

bցaF(b)

normalized: lim

aց−∞F(a) = 0, lim

+F(b) = 1.

(ii) To any such function there exists a unique probability measure µ on (R,B(R)) with(1.10).

Proof. (i) Monotonicity is obvious.

Right continuity: ifb ց a then]− ∞, b]ց ]− ∞, a], hence by continuity ofµ from above (vgl. Proposition 1.9):

F(a) =µ ]− ∞, a]1.9

= lim

bցaµ ]− ∞, b]

= lim

bցaF(b).

Similarly,]− ∞, a]ց ∅ifaց −∞(resp.]− ∞, b]րRifbր ∞), and thus

aց−∞lim F(a) = lim

aց−∞µ ]− ∞, a]

= 0

(resp. lim

bր∞F(b) = lim

bր∞µ ]− ∞, b]

= 1).

(ii) Existence: Letλbe the Lebesgue measure on]0,1[. Define the "inverse function"

GofF :R→[0,1]by G: ]0,1[→R G(y) := inf

x∈R

F(x)> y .

(4)

Note thaty < F(x) ⇒ G(y)6ximplies 0, F(x)

⊂ {G6x}

andG(y)6x ⇒ ∃xn ցxwithF(xn)> y, henceF(x)>y, so that {G6x} ⊂

0, F(x) .

Combining both inclusions we obtain that 0, F(x)

⊂ {G6x} ⊂

0, F(x) . so thatGis measurable.

Letµ:=G(λ) =λ◦G1 (probability measure on(R,B(R))). Then µ ]− ∞, x]

=λ({G6x}) =λ

0, F(x)

=F(x) ∀x∈R. Uniqueness: later.

Remark 9.3. (i) LetY be a r.v. with uniform distribution on[0,1], thenX =G(Y) has distributionµ. In particular: simulating the uniform distribution on[0,1]gives by transformation withGa simulation ofµ.

(ii) Some authors define the distribution functionF by F(x) :=µ ]− ∞, x[

. In this caseF is left continuous, not right continuous.

Remark 9.4. (i) LetF be a distribution function and letx∈R: Then F(x)−F(x−) = lim

nր∞µi x−1

n, xi

=µ({x}) is called the step height ofF inx. In particular:

F continuous ⇔ ∀x∈R:µ({x}) = 0 “µis continuous”.

(ii) LetF be monotone increasing and bounded, thenF has at most countable many points of discontinuity.

Definition 9.5. (i) F (resp.µ) is calleddiscrete, if there exists a countable setS⊂R withµ(S) = 1. In this case, µis uniquenely determined by the weights µ({x}), x∈S, andF is a step functionof the following type:

F(x) = X

yS, y6x

µ({y}).

(ii) F (resp. µ) is calledabsolutely continuous, if there exists a measurbale function f >0(called the "density"), such that

F(x) = Z x

−∞

f(t) dt, (1.2)

(5)

resp., for allA∈B(R):

µ(A) = Z

A

f(t) dt= Z

−∞

1A·f dt. (1.3)

In particular Z +

−∞

f(t) dt= 1.

Remark 9.6. (i) Every measurable function f >0 withR+

−∞ f(t) dt= 1 defines a probability measure on(R,B(R))byA7→R

Af(t) dt.

(ii) In the previous definition "(1.11)⇒(1.12)", because A 7→ R

Af(t) dt defines a probability measure on(R,B(R))with distribution functionF. Uniqueness in 9.2 implies the assertion.

Example 9.7. (i) Uniform distribution on [a, b]. Let f := b1a ·1[a,b]. The associated distribution function is given by

F(x) :=





0 ifx6a

1

ba·(x−a) ifx∈[a, b]

1 ifx>b.

(continuous analogue to the dicrete uniform distribution on a finite set) (ii) Exponential distribution with parameterα >0.

f(x) :=

(αeαx ifx>0 0 ifx <0, F(x) :=

(1−eαx ifx>0 0 ifx <0.

f α

(continuous analogue of the geometric distribution) Z k+1

k

f(x) dx=F(k+ 1)−F(k) =eαk(1−eα) = (1−p)kpwithp= 1−eα. (iii) Normal distributionN(m, σ2),m∈R,σ2>0

fm,σ2(x) = 1

√2πσ2 ·e(x−m)22 .

(6)

The associated distribution function is given by Fm,σ2(x) = 1

√2πσ2 · Z x

−∞

e(ym)22 dy

z=yσm

= 1

√2π· Z x−mσ

−∞

ez

2

2 dz=F0,1

x−m σ

.

Φ := F0,1 is called the distribution function of the standard normal distribution N(0,1).

ϕ

1

The expectationE[X](or more generalE[h(X)]) can be calculated with the help of the distributionµofX:

Proposition 9.8. Leth>0be measurable, then E

h(X)

= Z +

−∞

h(x)µ(dx)

=





 Z +

−∞

h(x)·f(x) dx ifµabsolutely continuous with densityf X

x∈S

h(x)·µ({x}) ifµdiscrete,µ(S) = 1andS countable.

Proof. See exercises.

Example 9.9. LetX beN(m, σ2)-distributed. Then E[X] =

Z

x·fm,σ2(x) dx=m+ Z

(x−m)·fm,σ2(x) dx

| {z }

=0

=m .

Thepthcentral moment ofX is given by E

|X−m|p

= Z

|x−m|p·fm,σ2(x) dx,

= Z

|x|p·f0,σ2(x) dx.

= 2 Z

0

xp· 1

√2πσ2 ·ex

2 2 dx,

|{z}=

y=x22

√1π·2p2 ·σp Z

0

yp+12 1·e−ydy.

| {z }

=Γ(p+12 )

(7)

In particular:

p= 1 :E

|X−m|

=σ· r2

π p= 2 :E

|X−m|2

2 p= 3 :E

|X−m|3

= 232 · σ3

√π p= 4 :E

|X−m|4

= 3σ4.

10 Weak convergence of probability measures

LetS be a topological space andSbe the Borelσ-algebra onS.

Letµ,µn,n∈N, be probability measures on(S,S).

What is a reasonable notion of convergence of the sequence µn towards µ? The notion of "pointwise convergence" in the sense thatµn(A)−−−−→n→∞ µ(A)for allA∈S is too strong for many applications.

Definition 10.1.Letµandµn,n∈N, be probability measures on(S,S). The sequence (µn)converges toµweakly if for allf ∈Cb(S) (= the space of bounded continuous functions onS) it follows that

Z f dµn

n→∞

−−−−→

Z f dµ.

Example 10.2. (i) xn n→∞

−−−−→xin S impliesδxn

−−−−→n→∞ δx weakly.

(ii) LetS:=R1 andµn:=N 0,1n

. Thenµn→δ0weakly, since for allf ∈Cb(R) Z

f dµn= Z

f(x)· 1 q

n1 ·e

x2 2·1

n dx

x=yn

= Z

f y

√n · 1

√2π·ey

2 2 dy

Lebesgue

−−−−→n→∞ f(0) = Z

f dδ0.

Proposition 10.3 (Portemanteau-Theorem). LetS be a metric space with metric d.

Then the following statements are equivalent:

(i) µn→µweakly (ii) R

f dµn n→∞

−−−−→R

f dµfor allf bounded and uniformly continuous (w.r.t. d) (iii) lim supn→∞µn(F)6µ(F)for allF ⊂S closed

(iv) lim infn→∞µn(G)>µ(G)for allG⊂S open

(8)

(v) limn→∞µn(A) =µ(A)for allA∈Swithµ( ¯A\A) = 0.˚ Proof. (iii)⇔(iv): Obvious by considering the complement.

(i)⇒(ii): Trivial.

(ii)⇒(iii): LetF ⊂S be closed, let Gm:=

x∈S

d(x, F)< 1 m

, m∈N open!

ThenGmցF, hence µ(Gm)ցµ(F).

Ifε >0there exists somem∈Nmitµ(Gm)< µ(F) +ε. Define

ϕ(x) :=





1 ifx60 1−x ifx∈[0,1]

0 ifx>1.

ϕ 1 0 and letf :=ϕ m·d(·, F)

.

f is Lipschitz, in particular uniformly continuous,f = 0 onGcmandf = 1onF, and thus

lim sup

n→∞

µn(F)6lim sup

n→∞

Z f dµn

(ii)= Z

f dµ 6µ(Gm)< µ(F) +ε.

(iii)⇒(v): LetAbe such that µ( ¯A\A) = 0. Then˚ µ(A) =µ( ˚A)(iv)6 lim inf

n→∞ µn( ˚A)6lim inf

n→∞ µn(A)6lim sup

n→∞ µn(A) 6lim sup

n→∞ µn( ¯A)

(iii)

6 µ( ¯A) =µ(A).

(v)⇒(iii): LetF ⊂S be closed. For allδ >0we have that

d(·, F)>δ ⊂

d(·, F) =δ . Note The set

D:=n δ >0

µ

d(·, F) =δ >0o

(9)

is countable, since for allnthe set Dn:=

δ >0

µ

d(·, F) =δ

| {z }

disjoint!

> 1 n

is finite for any n ∈ N. In particular, there exists a sequence δk ∈]0,∞[\D, δk↓0such that the set

Fk:=

d(·, F)6δk

satisfiesµ( ¯Fk\F˚k) = 0. Fk ցF now implies that lim sup

n→∞ µn(F)6lim sup

n→∞ µn(Fk)(v)= µ(Fk)−−−−→k→∞ µ(F).

(iii)⇒(i): Letf ∈Cb(S). It suffices to prove that lim sup

n→∞

Z

f dµn 6 Z

f dµ, (since then

−lim inf Z

f dµn6 Z

(−f) dµ, hencelim infR

f dµn>R f dµ) W.l.o.g. 06f 61

Fixk∈Nand letFj:=

f >kj

,j∈N(Fj closed! ) Then

1 k

Xk i=1

1Fi6f 6 1 k+1

k Xk

i=1

1Fi

Hence for all probability measuresν on(S,S):

1 k

Xk i=1

ν(Fi)6

Z

f dν6

1 k+1

k Xk i=1

ν(Fi).

and

lim sup

n→∞

Z

f dµn−1 k

()

6 1

k·lim sup

n→∞

Xk i=1

µn(Fi)

6 1 k

Xk i=1

lim sup

n→∞ µn(Fi)(iii)6 1 k

Xk i=1

µ(Fi)(6) Z

f dµ

(10)

Corollary 10.4. LetX,Xn,n∈N, be measurable mappings from(Ω,A, P)to(S,S) with distributionsµ,µn, n∈N. Then:

Xn n→∞

−−−−→X in probability ⇒ µn n→∞

−−−−→µ weakly

Here,limn→∞Xn =X in probability, iflimn→∞P(d(X, Xn)> δ) = 0 for allδ >0.

Proof. Let f ∈ Cb(S) be uniformly continuous and ε > 0. Then there exists a δ = δ(ε)>0such that:

x, y∈S withd(x, y)6δimplies|f(x)−f(y)|< ε Hence

Z

f dµ− Z

f dµn

=

E f(X)

−E

f(Xn) 6

Z

{d(X,Xn)6δ}

f(X)−f(Xn) dP+

Z

{d(X,Xn)>δ}

f(X)−f(Xn) dP 6ε+ 2kfk·P

d(Xn, X)> δ

| {z }

−−−−→n→∞ 0

.

Corollary 10.5. LetS=R1and letµ,µn,n∈N, be probability measures on(R,B(R)) with distributions functionsF,Fn. Then the following statements are equivalent:

(i) µn n→∞

−−−−→ µvaguely, i.e. limn→∞R

f dµn =R

f dµ for allf ∈C0(R1)(= the space of continuous functions with compact support)

(ii) µn n→∞

−−−−→µweakly

(iii) Fn(x)−−−−→n→∞ F(x)for allxwhere F is continuous.

(iv) µn ]a, b] n→∞

−−−−→µ ]a, b]

for all]a, b]withµ({a}) =µ({b}) = 0.

Proof. (i)⇒(ii): Exercise.

(ii)⇒(iii): Let xbe such thatF is continuous in x. Thenµ({x}) = 0, which implies by the Portmanteau theorem:

Fn(x) =µn ]− ∞, x] n→∞

−−−−→µ ]− ∞, x]

=F(x).

(iii)⇒(iv): Let]a, b]be such thatµ({a}) =µ({b}) = 0thenF is continuous inaand b and thus

µ ]a, b]

=F(b)−F(a)(iii)= lim

n→∞Fn(b)− lim

n→∞Fn(a)

= lim

n→∞µn ]a, b]

.

(11)

(iv)⇒(i): Let D :=

x∈R

µ({x}) = 0 . ThenR\D is countable, hence D ⊂R dense. Letf ∈C0(R), thenf is uniformly continuous, hence forε >0 we find c0<· · ·< cm∈Dsuch that

f−

Xm k=1

f(ck1)·I]ck1,ck]

| {z }

=:g

6sup

k

sup

x∈[ck−1,ck]

f(x)−f(ck1) < ε.

Then Z

f dµ− Z

f dµn

6

Z

|f −g|dµ

| {z }

+ Z

gdµ− Z

gdµn

+

Z

|f−g|dµn

| {z }

62ε+ Xm k=1

f(ck−1

µ ]ck−1, ck]

−µn ]ck−1, ck]

(iv) n→∞

−−−−→2ε.

11 Dynkin-systems and Uniqueness of probability measures

LetΩ6=∅.

Definition 11.1. A collection of subsets D⊂P(Ω)is called aDynkin-system, if:

(i) Ω∈D.

(ii) A∈D ⇒ Ac∈D.

(iii) Ai∈D,i∈N, pairwise disjoint, then [

iN

Ai∈D.

Example 11.2. (i) Everyσ-AlgebraA⊂P(Ω) is a Dynkin-system (ii) LetP1, P2 be probability measures on(Ω,A). Then

D:=

A∈A

P1(A) =P2(A) is a Dynkin-system

Remark 11.3. (i) LetDbe a Dynkin-system. Then

A, B∈D, A⊂B ⇒ B\A= (BcA)c∈D

(12)

(ii) Every Dynkin-system which is closed under finite unions (short notation:∩-stable), is aσ-algebra, because:

(a) A, B ∈D ⇒ A∪B=A∪ B\(A∩B)

| {z }

D

| {z }

(i)

D

∈D.

(b) Ai∈D, i∈N ⇒ [

iN

Ai=[

iN

Aii−[1

n=1

An

| {z }

(a)

D

c

| {z }

Dby ass., pairwise disjoint

∈D.

Proposition 11.4. Let B⊂P(Ω)be a∩-stable collection of subsets. Then σ(B) =D(B),

where

D(B) := \

DDynkin-system BD

D

is called the Dynkin-system generated byB. Proof. See text books on measure theory.

Proposition 11.5(Uniqueness of probability measures). LetP1, P2be probability mea- sures on(Ω,A), andB⊂Abe a∩-stable collection of subsets. Then:

P1(A) =P2(A) for all A∈B ⇒ P1=P2 onσ(B).

Proof. The collection of subsets D:=

A∈A

P1(A) =P2(A)

is a Dynkin-system containingB. Consequently, σ(B)11.4= D(B)⊂D.

Example 11.6. (i) Forp∈]0,1[the probability measurePp on(Ω :={0,1}N,A)is uniquely determined by

Pp[X1=x1, . . . , Xn=xn] =pk(1−p)nk, withk:=

Xn i=1

xi

for allx1, . . . , xn∈ {0,1},n∈N, because the collection of cylindrical sets {X1=x1, . . . , Xn=xn}, n∈N0, x1, . . . , xn∈ {0,1}

is∩-stable, generatingA(cf. Example 1.7).

(Existence ofPp forp=12 see Example 3.6. Existence forp∈]0,1[\ {12} later.)

(13)

(ii) A probability measure on(R,B(R))is uniquely determined through its distribution functionF (:=µ ]− ∞,·]

), because µ ]a, b]

=F(b)−F(a),

and the collection of intervals]a, b],a, b∈R, is∩-stable, generating B(R).

(14)
(15)

2 Independence

1 Independent events

Let(Ω,A, P)be a probability space.

Definition 1.1. A collection of eventsAi∈A,i∈I, are said to beindependent(w.r.t.

P), if for any finite subsetJ ⊂I P \

j∈J

Aj

=Y

j∈J

P(Aj).

A family of collection of subsetsBi ⊂A, i∈I, is said to beindependent, if for all finite subsetsJ⊂I and for all subsetsAj ∈Bj,j∈J

P \

jJ

Aj

=Y

jJ

P(Aj).

Proposition 1.2. LetBi,i∈I, be independent and closed under intersections. Then:

(i) σ(Bi),i∈I, are independent.

(ii) LetJk,k∈K, be a partition of the index setI. Then theσ-algebras σ [

iJk

Bi

, k∈K,

are independent.

Proof. (i) Let J ⊂ I, J finite, be of the form J = {j1, . . . , jn}. Let Aj1 ∈ σ(Bj

1), . . . , Ajn∈σ(Bj

n).

We have to show that

P(Aj1∩ · · · ∩Ajn) =P(Aj1)· · ·P(Ajn). (2.1) To this end suppose first thatAj2 ∈Bj

2, . . . , Ajn∈Bj

n, and define Dj

1 :=

A∈σ(Bj

1)

P(A∩Aj2∩ · · · ∩Ajn)

=P(A)·P(Aj2)· · ·P(Ajn) . ThenDj

1 is a Dynkin system (!) containingBj

1. Proposition 1.11.4 now implies σ(Bj

1) =D(Bj

1)⊂Dj

1 , henceσ(Bj

1) =Dj

1. Iterating the above argument forDj

2,Dj

3, implies (2.1).

(16)

(ii) Fork∈Kdefine Ck :=n \

jJ

Aj

J ⊂Jk, J finite, Aj ∈Bjo .

Then Ck is closed under intersections and the collection of subsets Ck, k ∈ K, are still independent, because: given k1, . . . , kn ∈ K and finite subsets J1 ⊂ Jk1, . . . , Jn⊂Jkn, then

P \

iJ1

Ai

| {z }

Ck

1

∩ · · · ∩ \

iJn

Ai

| {z }

Ckn

Biind.,iI

= Yn j=1

P \

iJj

Ai

.

(i) now implies that σ(Ck) =σ [

i∈Jk

Bi

, k∈K,

are independent too.

Example 1.3. LetAi ∈A,i∈I, be independent. ThenAi, Aci,i∈I, are independent too.

Remark 1.4. Pairwise independence does not imply independence in general.

Beispiel: Consider two tosses with a fair coin, i.e.

Ω :=

(i, k)

i, k∈ {0,1} , P :=uniform distribution.

Consider the events A:="1. toss1"=

(1,0),(1,1) B:="2. toss1"=

(0,1),(1,1) C:="1. and 2. toss equal"=

(0,0),(1,1) .

ThenP(A) =P(B) =P(C) = 12 andA, B, C are pairwise independent P(A∩B) =P(B∩C) =P(C∩A) = 1

4. But on the other hand

P(A∩B∩C) = 146=P(A)·P(B)·P(C).

Example 1.5. Independent 0-1-experiments with success probability p ∈ [0,1].

LetΩ :={0,1}N,Xi(ω) :=xi andω:= (xi)iN. LetPp be a probability measure on A:=σ {Xi= 1}, i= 1,2, . . .

, with

(i) Pp[Xi= 1] =p(hencePp[Xi= 0] =Pp {Xi= 1}c

= 1−p).

(17)

(ii) {Xi= 1},i∈N, areindependent w.r.t. Pp.

Existence of such a probability measure later! Then for anyx1, . . . , xn∈ {0,1}:

Pp[Xi1 =x1, . . . , Xin=xn]

(ii) and 1.3=

Yn j=1

Pp[Xij =xj](i)=pk(1−p)n−k,

wherek:=Pn

i=1xi gilt. HencePp is uniquely determined by (i) and (ii).

Proposition 1.6 (Kolmogorov’s Zero-One Law). Let Bn, n ∈ N, be independentσ- algebras, and

B:=

\ n=1

σ[

m=n

Bm

be thetail-field(resp. σ-algebra of terminal events). Then P(A)∈ {0,1} ∀A∈B

i.e.,P is deterministic onB.

Illustration: Independent 0-1-experiments LetBi=σ {Xi= 1}

. Then B= \

nN

σ [

m>n

Bm

is theσ-algebra containing the events of the remote future, e.g.

lim sup

i→∞ {Xi= 1}={“infinitely many ‘1” ’}

ω∈ {0,1}N lim

n→∞

1 n

Xn i=1

Xi(ω)

| {z }

=:Sn(ω)n

exists

Proof of the Zero-One Law. Proposition 1.2 implies that for alln B1,B2, . . . ,Bn1, σ [

m=n

Bm

are independent. SinceB⊂σS

m>nBm

, this implies that for alln B1,B2, . . . ,Bn1,B

are independent. By definition this implies that B,Bn, n∈Nare independent

(18)

and now Proposition 1.2(ii) implies that σ [

n∈N

Bn

und B

are idependent. Since B ⊂ σS

n>1Bn

we finally obtain that B and B are independent. The conclusion now follows from the next lemma.

Lemma 1.7. LetB⊂Abe aσ-algebra such thatBis independent fromB. Then P(A)∈ {0,1} ∀A∈B.

Proof. For allA∈B

P(A) =P(A∩A) =P(A)·P(A) =P(A)2. HenceP(A) = 0orP(A) = 1.

For any sequenceAn, n ∈N, of independent events inA, Kolmogorov’s Zero-One Law implies in particular for

A:= \

n∈N

[

m>n

Am =: lim sup

n→∞ An

thatP(A) = 0−1.

Proof: The σ-algebras Bn := σ{An} = {∅,Ω, A, Ac}, n ∈ N, are independent by Proposition 1.2 andA∈B.

Lemma 1.8 (Borel-Cantelli). (i) LetAi ∈A,i∈N. Then X

i=1

P(Ai)<∞ ⇒ P lim sup

i→∞ Ai

= 0.

(ii) Assume thatAi ∈A,i∈N, are independent. Then X

i=1

P(Ai) =∞ ⇒ P lim sup

i→∞

Ai

= 1.

Proof. (i) See Lemma 1.1.11.

(ii) It suffices to show that P[

m=n

Am

= 1 resp. P\

m=n

Acm

= 0 ∀n .

(19)

The last equality follows from the fact that P\

m=n

Acm

= lim

k→∞ Pn+k\

m=n

Acm

| {z }

=Qn+k

m=nP(Acm) ind.

=

n+kY

m=n

(1−P(Am))≤exp

n+kX

m=n

P(Am)

!

= 0

where we used the inequality1−α6e−α for allα∈R.

Example 1.9. Independent 0-1-experiments with success probability p∈]0,1[.

Let(x1, . . . , xN)∈ {0,1}N ("binary text of lengthN").

Pp["text occurs"] ?

To calculate this probability we partition the infinite sequenceω= (yn)∈ {0,1}Ninto blocks of lengthN

(y1, y2, . . .

| {z }

1. block length=N

. . . .

| {z }

2. block length=N

. . .)∈Ω :={0,1}N.

and consider the events Ai ="text occurs in theithblock". Clearly, Ai, i ∈N, are independent events (!) by Proposition 1.2(ii) with equal probability

Pp(Ai) =pK(1−p)NK =:α >0.

whereK:=PN

i=1xiis the total sum of ones. In particular,P

i=1Pp(Ai) =P

i=1α=

∞, and now Borel-Cantelli impliesPp(A) = 1, where A= lim sup

i→∞ Ai:="text occurs infinitely many times".

Moreover: since the indicator functions1A1,1A2, . . . are uncorrelated (since they are independent r.v. (see below)), the strong law of large numbers implies that

1 n

Xn i=1

1Ai

Pp-a.s.

−−−−→E[1Ai] =α ,

i.e. the relative frequency of the given text in the infinite sequence is strictly positive.

2 Independent random variables

Let(Ω,A, P)be a probability space.

(20)

Definition 2.1. A familyXi, i∈I, of r.v. on(Ω,A, P)is said to be independent, if theσ-algebras

σ(Xi) :=Xi1 B( ¯R)

=

{Xi∈A}

A∈B( ¯R) , i∈I,

are independent, i.e. for all finite subsetsJ ⊂I and any Borel subsetsAj∈B( ¯R) P \

j∈J

{Xj∈Aj}

=Y

j∈J

P[Xj ∈Aj].

Remark 2.2. Let Xi, i ∈ I, be independent and hi : ¯R → R¯, i ∈ I, B( ¯R)/B( ¯R)- measurable. ThenYi:=hi(Xi),i∈I, are again independent, becauseσ(Yi)⊂σ(Xi) for alli∈I.

Proposition 2.3. LetX1, . . . , Xn be independent r.v.,≥0. Then E[X1· · ·Xn] =E[X1]· · ·E[Xn].

Proof. W.l.o.g. n = 2. (Proof of the general case by induction, using the fact that X1·. . .·Xn1 and Xn are independent , since X1·. . .·Xn1 is measurable w.r.t σ σ(X1)∪ · · · ∪σ(Xn1)

andσ σ(X1)∪ · · · ∪σ(Xn1)

andσ(Xn)are independent by Proposition 1.2.)

It therefore suffices to consider two independent r.v. X, Y,≥0, and we have to show that

E[XY] =E[X]·E[Y]. (2.2)

W.l.o.g.X, Y simple

(for generalX andY there exist increasing sequences of simple r.v. Xn (resp. Yn), which areσ(X)-measurable (resp.σ(Y)-measurable), converging pointwise toX (resp.

Y).

ThenE[XnYn] =E[Xn]·E[Yn]for allnimplies (2.2) using monotone integration.) But forX,Y simple, hence

X = Xm i=1

αi1Ai and Y = Xn j=1

βj1Bj,

withαi, βj >0 andAi∈σ(X)resp. Bj∈σ(Y)it follows that E[XY] =X

i,j

αiβj·P(Ai∩Bj) =X

i,j

αiβj·P(Ai)·P(Bj) =E[X]·E[Y].

Corollary 2.4. X, Y independent,X, Y ∈L1

⇒ XY ∈L1 and E[XY] =E[X]·E[Y].

(21)

Proof. Let ε1, ε2 ∈ {+,−}. Then Xε1 and Yε2 are independent by Remark 2.2 and nonnegative. Proposition 2.3 implies

E[Xε1·Yε2] =E[Xε1]·E[Yε2].

In particularXε1·Yε2 inL1, becauseE[Xε1]·E[Yε2]<∞. Hence X·Y =X+·Y++X·Y−(X+·Y+X·Y+) ∈L1, andE[XY] =E[X]·E[Y].

Remark 2.5. (i) In general the converse to the above corollary does not hold: For example let X be N(0,1)-distributed and Y = X2. Then X and Y are not independent, but

E[XY] =E[X3] =E[X]·E[Y] = 0. (ii)

X, Y ∈L2 independent⇒X, Y uncorelated because

cov(X, Y) =E[XY]−E[X]·E[Y] = 0.

Corollary 2.6(to the strong law of large numbers ). LetX1, X2,· · · ∈L2 be indepen- dent withsupiNvar(Xi)<∞. Then

nlim→∞

1 n

Xn i=1

Xi(ω)−E[Xi]

= 0 P-a.s.

IfE[Xi]≡mthen lim

n→∞

1 n

Xn i=1

Xi(ω) =m P-a.s.

3 Kolmogorov’s law of large numbers

Proposition 3.1(Kolmogorov, 1930). LetX1, X2,· · · ∈L1be independent, identically distributed,m=E[Xi]. Then

1 n

Xn i=1

Xi(ω)

| {z }

empirical mean

−−−−→n→∞ m P-a.s.

Proposition 3.1 follows from the following more general result:

Proposition 3.2 (Etemadi, 1981). Let X1, X2,· · · ∈ L1 be pairwise independent, identically distributed,m=E[Xi]. Then

1 n

Xn i=1

Xi(ω)−−−−→n→∞ m P-a.s.

(22)

Proof. W.l.o.g.Xi>0

(otherwise considerX1+, X2+, . . . (pairwise independent, identically distributed) and X1, X2, . . . (pairwise independent, identically distributed)) 1. ReplaceXi byX˜i:= 1{Xi<i}Xi.

Clearly,

i=hi(Xi) with hi(x) :=

(x ifx < i 0 ifx>i

Then X˜1,X˜2, . . . are pairwise independent by Remark 2.2. For the proof it is now sufficient to show that forS˜n:=Pn

i=1iwe have that S˜n

n

n→∞

−−−−→m P-a.s.

Indeed, X n=1

P[Xn 6= ˜Xn] = X n=1

P[Xn >n] = X n=1

P[X1>n]

= X n=1

X k=n

P

X1∈[k, k+ 1[

= X k=1

k·P

X1∈[k, k+ 1[

= X k=1

E

k·1{X1[k,k+1[}

| {z }

6X1·1{X1∈[k,k+1[}

6E[X1]<∞

implies by the Borel-Cantelli lemma P[Xn 6= ˜Xn infinitely often] = 0.

2. Reduce the proof to convergence along the subsequence kn = ⌊αn⌋ (= largest natural number≤αn),α >1.

We will show in Step 3. that S˜kn−E[ ˜Skn]

kn

−−−−→n→∞ 0 P-a.s. (2.3)

This will imply the assertion of the Proposition, because E[ ˜Xi] =E

1{Xi<i}·Xi

=E

1{X1<i}·X1

i→∞

ր E[X1](=m) hence

1

kn ·E[ ˜Skn] = 1 kn

kn

X

i=1

E[ ˜Xi]−−−−→n→∞ m,

(23)

and thus 1 kn ·S˜kn

n→∞

−−−−→m P-a.s.

Ifl∈N∩[kn, kn+1[, then kn

kn+1

| {z }

n→∞

−−−−→α1

· S˜kn

kn

|{z}

n→∞

−−−−→P-a.s.m 6S˜l

l 6 S˜kn+1

kn+1

| {z }

n→∞

−−−−→P-a.s.m

· kn+1

kn

| {z }

n→∞

−−−−→α .

Hence there exists aP-null setNα∈A, such that for allω /∈Nα

1

α·m6lim inf

l→∞

l(ω)

l 6lim sup

l→∞

l(ω)

l 6α·m.

Finally choose a subsequenceαnց1. Then for allω /∈N :=S

n>1Nαn

l→∞lim S˜l(ω)

l =m.

3. Due to Lemma 1.7.7 it suffices for the proof of (2.3) to show that

∀ε >0 :

X n=1

P

"

kn−E[ ˜Skn] kn

#

<∞

(fast convergence in probability towards0)

Pairwise independence ofX˜i impliesX˜i pairwise uncorrelated, hence P

"

kn−E[ ˜Skn] kn

#

6 1

k2nε2 ·var( ˜Skn) = 1 kn2ε2

kn

X

i=1

var( ˜Xi)

6 1

kn2ε2

kn

X

i=1

E ( ˜Xi)2

.

It therefore suffices to show that s:=

X n=1

1 kn2

kn

X

i=1

E

( ˜Xi)2

= X

(i,n)N2, i6kn

1 kn2 ·E

( ˜Xi)2

<∞.

To this end note that s=

X i=1

X

n:kn>i

1 k2n

·E ( ˜Xi)2

.

(24)

We will show in the following that there exists a constantc such that X

n:kn>i

1 kn2 6 c

i2. (2.4)

This will then imply that s

(2.4)

6 c X i=1

1 i2 ·E

( ˜Xi)2

=c X i=1

1 i2·E

1{X1<i}·X12

6 c X i=1

1 i2

Xi l=1

l2·P

X1∈[l−1, l[

= c X

l=1

l2· X

i=l

1 i2

| {z }

62l1

·P

X1∈[l−1, l[

!

6 2c X l=1

l·P

X1∈[l−1, l[

= 2c X l=1

E

l·1{X1[l1,l[}

| {z }

6(X1+1)·1{X1∈[l1,l[}

6 2c· E[X1] + 1

<∞, where we used the fact that

X i=l

1 i2 6 1

l2 + X i=l+1

1

(i−1)i = 1 l2+

X i=l+1

1 i−1 −1

i

= 1 l2 +1

l 62 l . It remains to show (2.4). To this end note that

⌊αn⌋=knn< kn+ 1

⇒ kn> αn−1α>1> αn−αn1=

α−1 α

| {z }

=:cα

αn.

Letni be the smallest natural number satisfyingkni =⌊αni⌋>i, hence αni >i, then

X

n:kn>i

1

kn2 6cα2 X

n>ni

1

α2n =cα2· 1

1−α2·α2ni 6 cα2 1−α2 · 1

i2.

Corollary 3.3. Let X1, X2, . . . be pairwise independent, identically distributed (iid) withXi>0. Then

n→∞lim 1 n

Xn i=1

Xi(ω) =E[X1] ∈[0,∞]

P-a.s.

(25)

Proof. W.l.o.g.E[X1] =∞. Then n1Pn

i=1 Xi(ω)∧N n→∞

−−−−→E[X1∧N],P-a.s. for allN, hence

1 n

Xn i=1

Xi(ω)> 1 n

Xn i=1

Xi(ω)∧N n→∞

−−−−→E[X1∧N]Nր→∞E[X1] P-a.s.

Example 3.4. Growth in random mediaLetY1, Y2, . . . be i.i.d.,Yi>0, withm:=

E[Yi] (existence of such a sequence later!)

DefineX0= 1and inductivelyXn:=Xn1·Yn

Clearly,Xn=Y1· · ·Yn andE[Xn] =E[Y1]· · ·E[Yn] =mn, hence

E[Xn]→





+∞ ifm >1 exponential growth (supercritical) 1 ifm= 1 critical

0 ifm <1 exponential decay (subcritical) What will be the long-time behaviour ofXn(ω)?

Surprisingly, in the supercritical casem >1, one may observe thatlimn→∞Xn= 0 with positive probability.

Explanation: Suppose thatlogYi∈L1. Then 1

nlogXn = 1 n

Xn i=1

logYi n→∞

−−−−→E[logY1] =:α P-a.s.

and

α <0: ∃ε >0with α+ε <0, so thatXn(ω)6en(α+ε) ∀n>n0(ω), hence P-a.s.

exponential decay

α >0: ∃ε >0with α−ε >0, so thatXn(ω)>en(αε) ∀n>n0(ω), hence P-a.s.

exponential growth Note that Jensen’s inequality

α=E[logY1] 6 logE[Y1]

| {z }

=m

,

and in general the inequality is strict, i.e. α < logm, so that it might happen that α <0 althoughm >1(!)

IllustrationAs a particular example let Yi:=

(1

2(1 +c) with prob.12

1

2 with prob.12 , so thatE[Yi] = 14(1 +c) +14 =12+14c(supercritical ifc >2)

On the other hand E[logY1] = 1

2 ·

"

log 1

2(1 +c)

+ log1 2

#

= 1

2·log1 +c 4

c<3

< 0.

HenceXn n→∞

−−−−→0P-a.s. with exponential rate for c <3, whereas at the same time forc >2E[Xn] =mnր ∞with exponential rate.

(26)

Back to Kolmogorov’s law of large numbers:

LetX1, X2, . . .∈L1i.i.d. withm:=E[Xi]. Then 1

n Xn i=1

Xi(ω)−−−−→n→∞ E[X1] P-a.s.

Define the "random measure"

̺n(ω, A) := 1 n

Xn i=1

1A Xi(ω)

="relative frequency of the eventXi∈A"

Then

̺n(ω,·) = 1 n

Xn i=1

δXi(ω)

is a probability measure on R,B(R)

for fixed ω and it is called the empirical distri- bution of the firstnobservations

Proposition 3.5. ForP-almost everyω∈Ω:

̺n(ω,·)−−−−→n→∞ µ:=P◦X11 weakly.

Proof. Clearly, Kolmogorov’s law of large numbers implies that for anyx∈R Fn(ω, x) :=̺n ω,]− ∞, x]

= 1 n

Xn i=1

1]−∞,x] Xi(ω)

→E[1]−∞,x](X1)] =P[X1≤x] =µ ]− ∞, x]

=:F(x) P-a.s., hence for everyω /∈N(x)for someP-null setN(x).

Then N := [

r∈Q

N(r).

is aP-null set too, and for allx∈Rand alls, r∈Qwiths < x < randω /∈N: F(s) := lim

n→∞Fn(ω, s)6lim inf

n→∞ Fn(ω, x) 6lim sup

n→∞ Fn(ω, x)6 lim

n→∞Fn(ω, r) =F(r).

Hence, ifF is continuous atx, then for ω /∈N

n→∞lim Fn(ω, x) =F(x).

Now the assertion follows from the Portmanteau theorem.

Referenzen

ÄHNLICHE DOKUMENTE

Elstrodt, Maß- und Integrationstheorie, Springer, Berlin, first edition 1996, fifth edition 2007J. Stute, Wahrscheinlichkeitstheorie, Springer,

Motivated by the Examples II.5.2 and II.6.1 we introduce a notion of convergence that is weaker than convergence in mean and convergence almost surely.. In the sequel, X, X n

Billingsley, Probability and Measure, Wiley, New York, first edition 1979, third edition 1995.. Elstrodt, Maß- und Integrationstheorie, Springer, Berlin, first edition 1996,

TU Darmstadt Fachbereich Mathematik.

This text is a summary of the lecture on Probability Theory held at the TU Darmstadt in Winter Term 2007/08.. Please email all misprints and

and observe that male students mainly applied at faculties with a high probability of admission, whereas female students mainly applied at faculties with a low probability

The Radon-Nikodym theorem will be used to obtain a second, independent proof for the existence of the conditional expectation. We will prove the theorem in the case of

‘⇒’ holds trivially. Analogously to the construction of the Lebesgue measure; see ¨