Proof of Theorem 4.3.1 - The age distribution in the general case

4.3 The age distribution in the general case

4.3.4 Proof of Theorem 4.3.1

We first prove that we obtain the same distribution if ages are read off from the tree under the verticallymirrored contour process (Hτ−t)0≤t≤τ, whereτ = inf{x >0 : Hx= 0} (see Figure 4.5). To see this, denote by Ψ :T → T the corresponding transformation on the set of trees. Let Z ∼Q_λ,µ,T be the random tree obtained from the above birth and death process, and denote by Θ(z) the finite set of trees that have the same vertex positions in the vertical direction as a tree z ∈ T, but not necessarily the same phylogeny. Note that Θ(Ψ(z)) = Θ(z) for everyz∈ T. Given Θ(Z), the random treeZ is uniformly distributed on Θ(Z), and so is Ψ(Z) because, restricted to the finite set Θ(Z), the map Ψ is just a permutation. Hence the distributions of Ψ(Z) and Z are equal by integration of the conditional distributions given Θ(Z).

Since it is more customary to draw stochastic processes from left to right, we mirror the situation in Figure 4.5, which gives us the original contour process (H_x)x≥0 again, enveloping a right-aligned

x T

age

Figure 4.5: The mirrored deflected contour process (the realization corresponds to Figure 4.4) and its (left-aligned) tree

tree (constructed in the analogous way as the left-aligned tree, but from right to left; see Figure 4.6).

Of course the ages still correspond to vertical line lengths.

x age

ττ ς2 ς3 ς4 ς5 ς6 ς7 ς8

x Hς4 =Hς6 =T

H_ς₂ Hς3

H_ς₁ = 0 H_ς₅

Hς7

Figure 4.6: The deflected contour process and its right-aligned tree

Obviously, the local maxima of the contour process at heightT correspond to the living individuals in the linear birth and death process at time T, which we assume to be numbered from left to right in the right-aligned tree here. In total, there are B_T local maxima (at any height) corresponding to theB_T individuals that have been born up to time T. Let these maxima be numbered from left to right, and letI₁< . . . <I_Y_T be the indices of the maxima at heightT andI_k=∞fork > Y_T. Then V_I₁, . . . , V_I_YT₋₁, V_I_YT ∧T are the ages we are interested in, where (V_k)k∈N is the sequence of random variables from the definition of the contour process, i.e. (V_k)k∈Nare independent and identically Exp(λ) distributed, and∧denotes the minimum operator. Let 0 =ς₁< . . . < ς_2B_T₊₁ =τ be the positions of all local extrema of (Hx)0≤x≤τ (cf. Figure 4.6). Note thatς2I1, . . . , ς2I_YT are the positions of the maxima at heightT. For convenience, we define ς_k=ς_2B_T₊₁=τ for all k >2B_T andς∞= lim

k→∞ς_k=τ. By the above construction of the contour process, the process (H_ς_k,(−1)^k)k∈N is a Markov chain on

[0, T]× {−1,1}with initial value (0,−1) at time 1 and transition probabilities P((H_ς_k+1,(−1)^k+1)∈B₁×B₂|(H_ς_k,(−1)^k) = (z₁, z₂))











δT(B1)e^−µ(T^−z¹⁾+µ R

B1∩[z1,T)

e^−µ(u−z¹⁾du

δ1(B2) ifz2=−1,

δ0(B1)e^−λz¹ +λ R

B1∩(0,z1]

e^−λ(z¹^−u)du

δ−1(B2) ifz2= 1,

whereB₁ is a Borel subset of [0, T],B₂ ⊂ {0,1} and (z₁, z₂)∈[0, T]× {−1,1}. Note that the second component of (H_ς_k,(−1)^k)k∈N tells us if (H_x)0≤x≤τ has a minimum (if it is −1) or a maximum (if it is 1) atςk. Note further that this Markov chain determines (Hx)x≥0 completely.

We define the sequence (ξk)k∈N of hitting times in (T,1) recursively by

ξ₁= inf{k∈N: (H_ς_k,(−1)^k) = (T,1)} and ξ_l+1= inf{k > ξ_l: (H_ς_k,(−1)^k) = (T,1)}

for alll∈N, where inf∅=∞. Note that eachξ_l is a stopping time and that max{l:ξ_l <∞}=Y_T. Moreover, we define the “horizontally mirrored” excursions

E^(l)= (E_k^(l))0≤k≤ξl+1−ξl= (T −Hςk)ξl≤k≤ξl+1

forl∈N, where we set ∞ − ∞= 0.

Since P((H_ς_ξl,(−1)^ξ^l) = (T,1) | ξ_l < ∞) = 1, the strong Markov property and the fact that the second component of the process is deterministic imply that (H_ς_k)_0≤k≤ξ_l and (H_ς_k)_ξ_l_≤k are inde-pendent given ξ_l < ∞ and that (Hς_k)_ξ_l≤k has the same distribution given ξ_l < ∞ for any l ∈ N. As a consequence, we have for any l ∈ N that (H_ς_k)0≤k≤ξ₁, E⁽¹⁾, E⁽²⁾, . . . , E^(l−1) are independent of (E^(l),(ξm)m≥l+1) given ξl < ∞ and hence that (Hςk)0≤k≤ξ1, E⁽¹⁾, E⁽²⁾, . . . , E^(l−1) are indepen-dent of E^(l) given ξ_l < ∞ and given any sub-σ-algebra of σ(ξ_m : m ≥ l+ 1). This implies that (H_ς_k)0≤k≤ξ₁, E⁽¹⁾, E⁽²⁾, . . . , E^(l−1), E^(l) are independent given ξ_l < ∞ and given any sub-σ-algebra of σ(ξm : m ≥ l+ 1) for any l ∈ N. Note that for any l ∈ N, the first l−1 “horizontally mir-rored” excursionsE⁽¹⁾, E⁽²⁾, . . . , E^(l−1) all have the same distribution under this conditioning because E⁽¹⁾, E⁽²⁾, . . . , E^(l−1) all have the same distribution given ξl <∞ and E⁽¹⁾, E⁽²⁾, . . . , E^(l−1) are inde-pendent of (ξm)m≥l+1 given ξ_l<∞.

Since {Y_T = y_T} = {ξ_y_T < ∞, ξ_y_T₊₁ = ∞}, we obtain that, given Y_T = y_T, the processes (H_ς_k)0≤k≤ξ₁,E⁽¹⁾, E⁽²⁾, . . . , E^(y^T⁻¹⁾, E^(y^T⁾ are independent and E⁽¹⁾, E⁽²⁾, . . . , E^(y^T⁻¹⁾ are identically distributed. By having a closer look at the strong Markov property used above, it is seen that neither the distribution ofE⁽¹⁾ given Y_T =y_T ≥2 nor the distribution of E^(y^T⁾ given Y_T =y_T ≥1 depend on the concrete value y_T. Note that on {Y_T = y_T}, we have E⁽¹⁾₁ = V_I₁, . . . , E^(y₁ ^T⁻¹⁾ = V_I_yT₋₁, and E₁^(y^T⁾ = V_I_yT ∧T, so these are the ages we are interested in. The conditional distributions of E₁⁽¹⁾ and E₁^(y^T⁾ given Y_T = y_T can be easily rediscovered as conditional distributions in another contour process, which is what we look at next.

Let ( ˜Hx)x≥0∼Pµ,λ,T. By Theorem 4.3.8, ( ˜Hx)x≥0 is the contour process corresponding to a linear birth and death process (Zt)t≥0 with per-capitabirth rate µand per-capitadeath rate λthat is killed atT. Let the positions (˜ς_k)k∈N of the extrema of ( ˜H_x)x≥0 be defined analogously to (ς_k)k∈N.

Both ( ˜H˜ςk+1)k∈N0 and (E_k^(l))k∈N0 for arbitrary l ∈ {1, . . . , Y_T} start in 0 and alternate between independently adding Exp(λ) and subtracting Exp(µ) random variables at least until 0 orT is crossed.

Thus until this happens, they have the same distribution.

We have{Y_T ≥2}={E_ξ⁽¹⁾

2−ξ1 = 0}={max_k∈_NE_k⁽¹⁾ < T}, and therefore for anyy_T ≥2 that, since the distribution ofE₁⁽¹⁾ does not depend on the concrete value ofy_T,

L(E₁⁽¹⁾|Y_T =y_T) =L E₁⁽¹⁾

E⁽¹⁾ returns to 0 before reachingT

H˜_ς_˜₂

maxk∈N

H˜_˜_ς_k < T

H˜_ς_˜₂

maxx≥0

H˜_x< T

. (4.1)

On the other hand we have{Y_T ≥1}={Y_T ≥1,maxk∈NE_k^(Y^T⁾ =T}(in factξ_Y_T₊₁ =∞and E^(Y^T⁾is eventually absorbed inT). Therefore, for anyy_T ≥1, since the distribution ofE₁^(y^T⁾ does not depend on the concrete value ofyT,

L(E₁^(y^T⁾|Y_T =y_T) =L E₁^(y^T⁾

E^(y^T⁾ reaches T before returning to 0

H˜_˜_ς₂

maxk∈N

H˜_ς_˜_k =T

H˜_˜_ς₂

maxx≥0

H˜_x=T

. (4.2)

In other words, we have identified the desired age distributions at time T given YT = yT as the distribution of the lifetime of the first individual in a linear birth and death process (Z_t)t≥0 with per-capita birth rate µ and per-capita death rate λconditioned on extinction of the process by time T (Equation (4.1), first yT −1 individuals), i.e. ZT = 0, or conditioned on survival of the process up to timeT (Equation (4.2), last individual, lifetime measured up to timeT), i.e.Z_T >0. Denote byL the lifetime of the starting individual, and letF∗ and F^∗ be the cumulative distribution functions of LgivenZ_T = 0 and of min(L, T) givenZ_T >0, respectively. We then obtain from the above that the age distribution givenY_T =y_T >0 has cumulative distribution function

FyT(t) = yT −1 yT

F∗(t) + 1 yT

F^∗(t) (4.3)

for allt≥0.

By Bayes’ Theorem, F∗ has density

f∗(t)∝λe^−λtP(ZT = 0|L=t), (4.4)

fort∈[0, T). Given L=t, the birth times of the offspring of the starting individual form a Poisson process of rateµ(up to timet). By conditioning on the number of offspring and their birth times and

plugging in their extinction probabilities from Section 2.2, we obtain

We may now compute the normalizing constant of the density f∗ in (4.4). For λ6=µ, we obtain

Thus the densityf∗ is given by

f∗(t) = λe^(λ−µ)Te^−λt−µe^−µt

for t ∈ [0, T] if λ = µ > 0. By integration, we obtain that the cumulative distribution function F∗

takes the form

F∗(t) = 1−e^−λt−e^{−(λ−µ)T}e^−µt

1−e^{−(λ−µ)T} (4.9)

and

F∗(t) = 1−e^−λt(T −t) T fort∈[0, T], respectively.

4.3.9 Remark

In particular, we see that

1−F∗(t) = e^−λt−e^{−(λ−µ)T}e^−µt

1−e^{−(λ−µ)T} ≤ e^−λt(1−e^{−(λ−µ)T})

1−e^{−(λ−µ)T} =e^−λt fort∈[0, T] ifλ6=µand

1−F∗(t) = e^−λt(T −t)

T ≤e^−λt

for t ∈ [0, T] if λ = µ. Thus, given Y_T = y_T, the age distribution of the first y_T −1 individuals is stochastically dominated by the Exp(λ) distribution.

In order to prove Theorem 4.3.1, it remains to deriveF^∗, which we do in a similar way. Recall that we have to computeL(min(L, T)|Z_T >0). A slight notational complication arises from the fact that F^∗ has a discontinuity at T. We note that min(L, T) has a density ˜f with respect to the measure Leb_[0,T₎+δ_T given by

f(t) =˜







λe^−λt if 0≤t < T, e^−λT ift=T.

Thus by Bayes’ Theorem, the age distribution of the last individual givenY_T =y_T has a density f^∗ with respect to Leb_[0,T₎+δ_T satisfying

f^∗(t)∝f(t)˜ P(Z_T >0|min(L, T) =t) =







λe^−λtP(ZT >0|L=t) if 0≤t < T,

e^−λT ift=T.

Firstly, we consider the case whereλ6=µ. By (4.5), we have that

P(Z_T >0|L=t) = 1−P(Z_T = 0|L=t) = 1−λe^(λ−µ)(T^−t)−µ λe^(λ−µ)T −µ e^(λ−µ)t fort∈[0, T), which implies

f^∗(t)∝







λe^{−λt µe}_λe(λ−µ)T^(λ−µ)t^−µ−µ if 0≤t < T,

e^−λT ift=T.

We can compute the normalizing constant off^∗ as e^−λT + λµ

λe^(λ−µ)T −µ

e^−λt(e^(λ−µ)t−1)dt=e^−λT+ λµ λe^(λ−µ)T −µ

(e^−µt−e^−λt)dt

=e^−λT+ λ(1−e^−µT)−µ(1−e^−λT) λe^(λ−µ)T −µ

= λ−µ

λe^(λ−µ)T −µ. (4.10)

4.3.10 Remark

By Bayes’ Theorem, the normalizing constant of the densityf^∗ is just the probability that a linear birth and death process with birth rateµand death rateλsurvives up to timeT. Thus we could also have used the extinction probability from Section 2.2 in order to obtain the right-hand side of (4.10).

We may conclude thatf^∗ is given by f^∗(t) =







λµ^e^−µt_λ−µ^−e^−λt if 0≤t < T,

λe^−µT−µe^−λT

λ−µ ift=T.

(4.11) By integration, we obtain that the cumulative distribution functionF^∗ for the age of the last individual givenY_T =y_T takes the form

F^∗(t) = λ(1−e^−µt)−µ(1−e^−λt)

λ−µ 1[0,T)(t) +1{T}(t) (4.12)

fort∈[0, T].

Forλ=µ >0, we proceed analogously: By (4.6), we have that

P(ZT >0|L=t) = 1−P(ZT >0|L=t) = 1−1 +λ(T −t)

1 +λT = λt

1 +λT fort∈[0, T), which leads to

f^∗(t)∝







λe^−λt_1+λT^λt if 0≤t < T, e^−λT ift=T.

We compute the normalizing constant off^∗ as e^−λT +

e^−λt λt

1 +λTdt=e^−λT + 1

1 +λT −e^−λT = 1 1 +λT. This yields thatf^∗ is given by

f^∗(t) =







λ²te^−λt if 0≤t < T, (1 +λT)e^−λT ift=T.

(4.13) By integration, we obtain that the cumulative distribution functionF^∗ takes the form

F^∗(t) = (1−e^−λt(1 +λt))1[0,T)(t) +1{T}(t) fort∈[0, T].

Plugging the above expressions for F∗ and F^∗ into Equation (4.3) yields the statement of

Theo-rem 4.3.1.

Im Dokument Convergence Rates in Dynamic Network Models (Seite 45-52)