• Keine Ergebnisse gefunden

Proof of Theorem 4.3.1

4.3 The age distribution in the general case

4.3.4 Proof of Theorem 4.3.1

We first prove that we obtain the same distribution if ages are read off from the tree under the verticallymirrored contour process (Hτ−t)0≤t≤τ, whereτ = inf{x >0 : Hx= 0} (see Figure 4.5). To see this, denote by Ψ :T → T the corresponding transformation on the set of trees. Let Z ∼Qλ,µ,T be the random tree obtained from the above birth and death process, and denote by Θ(z) the finite set of trees that have the same vertex positions in the vertical direction as a tree z ∈ T, but not necessarily the same phylogeny. Note that Θ(Ψ(z)) = Θ(z) for everyz∈ T. Given Θ(Z), the random treeZ is uniformly distributed on Θ(Z), and so is Ψ(Z) because, restricted to the finite set Θ(Z), the map Ψ is just a permutation. Hence the distributions of Ψ(Z) and Z are equal by integration of the conditional distributions given Θ(Z).

Since it is more customary to draw stochastic processes from left to right, we mirror the situation in Figure 4.5, which gives us the original contour process (Hx)x≥0 again, enveloping a right-aligned

t

x T

age

τ

Figure 4.5: The mirrored deflected contour process (the realization corresponds to Figure 4.4) and its (left-aligned) tree

tree (constructed in the analogous way as the left-aligned tree, but from right to left; see Figure 4.6).

Of course the ages still correspond to vertical line lengths.

t

x age

ττ ς2 ς3 ς4 ς5 ς6 ς7 ς8

t

x Hς4 =Hς6 =T

Hς2 Hς3

Hς1 = 0 Hς5

Hς7

Figure 4.6: The deflected contour process and its right-aligned tree

Obviously, the local maxima of the contour process at heightT correspond to the living individuals in the linear birth and death process at time T, which we assume to be numbered from left to right in the right-aligned tree here. In total, there are BT local maxima (at any height) corresponding to theBT individuals that have been born up to time T. Let these maxima be numbered from left to right, and letI1< . . . <IYT be the indices of the maxima at heightT andIk=∞fork > YT. Then VI1, . . . , VIYT−1, VIYT ∧T are the ages we are interested in, where (Vk)k∈N is the sequence of random variables from the definition of the contour process, i.e. (Vk)k∈Nare independent and identically Exp(λ) distributed, and∧denotes the minimum operator. Let 0 =ς1< . . . < ς2BT+1 =τ be the positions of all local extrema of (Hx)0≤x≤τ (cf. Figure 4.6). Note thatς2I1, . . . , ς2IYT are the positions of the maxima at heightT. For convenience, we define ςk2BT+1=τ for all k >2BT andς= lim

k→∞ςk=τ. By the above construction of the contour process, the process (Hςk,(−1)k)k∈N is a Markov chain on

[0, T]× {−1,1}with initial value (0,−1) at time 1 and transition probabilities P((Hςk+1,(−1)k+1)∈B1×B2|(Hςk,(−1)k) = (z1, z2))

=





δT(B1)e−µ(T−z1)+µ R

B1∩[z1,T)

e−µ(u−z1)du

δ1(B2) ifz2=−1,

δ0(B1)e−λz1 +λ R

B1∩(0,z1]

e−λ(z1−u)du

δ−1(B2) ifz2= 1,

whereB1 is a Borel subset of [0, T],B2 ⊂ {0,1} and (z1, z2)∈[0, T]× {−1,1}. Note that the second component of (Hςk,(−1)k)k∈N tells us if (Hx)0≤x≤τ has a minimum (if it is −1) or a maximum (if it is 1) atςk. Note further that this Markov chain determines (Hx)x≥0 completely.

We define the sequence (ξk)k∈N of hitting times in (T,1) recursively by

ξ1= inf{k∈N: (Hςk,(−1)k) = (T,1)} and ξl+1= inf{k > ξl: (Hςk,(−1)k) = (T,1)}

for alll∈N, where inf∅=∞. Note that eachξl is a stopping time and that max{l:ξl <∞}=YT. Moreover, we define the “horizontally mirrored” excursions

E(l)= (Ek(l))0≤k≤ξl+1−ξl= (T −Hςk)ξl≤k≤ξl+1

forl∈N, where we set ∞ − ∞= 0.

Since P((Hςξl,(−1)ξl) = (T,1) | ξl < ∞) = 1, the strong Markov property and the fact that the second component of the process is deterministic imply that (Hςk)0≤k≤ξl and (Hςk)ξl≤k are inde-pendent given ξl < ∞ and that (Hςk)ξl≤k has the same distribution given ξl < ∞ for any l ∈ N. As a consequence, we have for any l ∈ N that (Hςk)0≤k≤ξ1, E(1), E(2), . . . , E(l−1) are independent of (E(l),(ξm)m≥l+1) given ξl < ∞ and hence that (Hςk)0≤k≤ξ1, E(1), E(2), . . . , E(l−1) are indepen-dent of E(l) given ξl < ∞ and given any sub-σ-algebra of σ(ξm : m ≥ l+ 1). This implies that (Hςk)0≤k≤ξ1, E(1), E(2), . . . , E(l−1), E(l) are independent given ξl < ∞ and given any sub-σ-algebra of σ(ξm : m ≥ l+ 1) for any l ∈ N. Note that for any l ∈ N, the first l−1 “horizontally mir-rored” excursionsE(1), E(2), . . . , E(l−1) all have the same distribution under this conditioning because E(1), E(2), . . . , E(l−1) all have the same distribution given ξl <∞ and E(1), E(2), . . . , E(l−1) are inde-pendent of (ξm)m≥l+1 given ξl<∞.

Since {YT = yT} = {ξyT < ∞, ξyT+1 = ∞}, we obtain that, given YT = yT, the processes (Hςk)0≤k≤ξ1,E(1), E(2), . . . , E(yT−1), E(yT) are independent and E(1), E(2), . . . , E(yT−1) are identically distributed. By having a closer look at the strong Markov property used above, it is seen that neither the distribution ofE(1) given YT =yT ≥2 nor the distribution of E(yT) given YT =yT ≥1 depend on the concrete value yT. Note that on {YT = yT}, we have E(1)1 = VI1, . . . , E(y1 T−1) = VIyT−1, and E1(yT) = VIyT ∧T, so these are the ages we are interested in. The conditional distributions of E1(1) and E1(yT) given YT = yT can be easily rediscovered as conditional distributions in another contour process, which is what we look at next.

Let ( ˜Hx)x≥0∼Pµ,λ,T. By Theorem 4.3.8, ( ˜Hx)x≥0 is the contour process corresponding to a linear birth and death process (Zt)t≥0 with per-capitabirth rate µand per-capitadeath rate λthat is killed atT. Let the positions (˜ςk)k∈N of the extrema of ( ˜Hx)x≥0 be defined analogously to (ςk)k∈N.

Both ( ˜H˜ςk+1)k∈N0 and (Ek(l))k∈N0 for arbitrary l ∈ {1, . . . , YT} start in 0 and alternate between independently adding Exp(λ) and subtracting Exp(µ) random variables at least until 0 orT is crossed.

Thus until this happens, they have the same distribution.

We have{YT ≥2}={Eξ(1)

2−ξ1 = 0}={maxk∈NEk(1) < T}, and therefore for anyyT ≥2 that, since the distribution ofE1(1) does not depend on the concrete value ofyT,

L(E1(1)|YT =yT) =L E1(1)

E(1) returns to 0 before reachingT

=L

ς˜2

maxk∈N

˜ςk < T

=L

ς˜2

maxx≥0

x< T

. (4.1)

On the other hand we have{YT ≥1}={YT ≥1,maxk∈NEk(YT) =T}(in factξYT+1 =∞and E(YT)is eventually absorbed inT). Therefore, for anyyT ≥1, since the distribution ofE1(yT) does not depend on the concrete value ofyT,

L(E1(yT)|YT =yT) =L E1(yT)

E(yT) reaches T before returning to 0

=L

˜ς2

maxk∈N

ς˜k =T

=L

˜ς2

maxx≥0

x=T

. (4.2)

In other words, we have identified the desired age distributions at time T given YT = yT as the distribution of the lifetime of the first individual in a linear birth and death process (Zt)t≥0 with per-capita birth rate µ and per-capita death rate λconditioned on extinction of the process by time T (Equation (4.1), first yT −1 individuals), i.e. ZT = 0, or conditioned on survival of the process up to timeT (Equation (4.2), last individual, lifetime measured up to timeT), i.e.ZT >0. Denote byL the lifetime of the starting individual, and letF and F be the cumulative distribution functions of LgivenZT = 0 and of min(L, T) givenZT >0, respectively. We then obtain from the above that the age distribution givenYT =yT >0 has cumulative distribution function

FyT(t) = yT −1 yT

F(t) + 1 yT

F(t) (4.3)

for allt≥0.

By Bayes’ Theorem, F has density

f(t)∝λe−λtP(ZT = 0|L=t), (4.4)

fort∈[0, T). Given L=t, the birth times of the offspring of the starting individual form a Poisson process of rateµ(up to timet). By conditioning on the number of offspring and their birth times and

plugging in their extinction probabilities from Section 2.2, we obtain

We may now compute the normalizing constant of the density f in (4.4). For λ6=µ, we obtain

T

Thus the densityf is given by

f(t) = λe(λ−µ)Te−λt−µe−µt

for t ∈ [0, T] if λ = µ > 0. By integration, we obtain that the cumulative distribution function F

takes the form

F(t) = 1−e−λt−e−(λ−µ)Te−µt

1−e−(λ−µ)T (4.9)

and

F(t) = 1−e−λt(T −t) T fort∈[0, T], respectively.

4.3.9 Remark

In particular, we see that

1−F(t) = e−λt−e−(λ−µ)Te−µt

1−e−(λ−µ)T ≤ e−λt(1−e−(λ−µ)T)

1−e−(λ−µ)T =e−λt fort∈[0, T] ifλ6=µand

1−F(t) = e−λt(T −t)

T ≤e−λt

for t ∈ [0, T] if λ = µ. Thus, given YT = yT, the age distribution of the first yT −1 individuals is stochastically dominated by the Exp(λ) distribution.

In order to prove Theorem 4.3.1, it remains to deriveF, which we do in a similar way. Recall that we have to computeL(min(L, T)|ZT >0). A slight notational complication arises from the fact that F has a discontinuity at T. We note that min(L, T) has a density ˜f with respect to the measure Leb[0,T)T given by

f(t) =˜

λe−λt if 0≤t < T, e−λT ift=T.

Thus by Bayes’ Theorem, the age distribution of the last individual givenYT =yT has a density f with respect to Leb[0,T)T satisfying

f(t)∝f(t)˜ P(ZT >0|min(L, T) =t) =

λe−λtP(ZT >0|L=t) if 0≤t < T,

e−λT ift=T.

Firstly, we consider the case whereλ6=µ. By (4.5), we have that

P(ZT >0|L=t) = 1−P(ZT = 0|L=t) = 1−λe(λ−µ)(T−t)−µ λe(λ−µ)T −µ e(λ−µ)t fort∈[0, T), which implies

f(t)∝

λe−λt µeλe(λ−µ)T(λ−µ)t−µ−µ if 0≤t < T,

e−λT ift=T.

We can compute the normalizing constant off as e−λT + λµ

λe(λ−µ)T −µ

T

Z

0

e−λt(e(λ−µ)t−1)dt=e−λT+ λµ λe(λ−µ)T −µ

T

Z

0

(e−µt−e−λt)dt

=e−λT+ λ(1−e−µT)−µ(1−e−λT) λe(λ−µ)T −µ

= λ−µ

λe(λ−µ)T −µ. (4.10)

4.3.10 Remark

By Bayes’ Theorem, the normalizing constant of the densityf is just the probability that a linear birth and death process with birth rateµand death rateλsurvives up to timeT. Thus we could also have used the extinction probability from Section 2.2 in order to obtain the right-hand side of (4.10).

We may conclude thatf is given by f(t) =

λµe−µtλ−µ−e−λt if 0≤t < T,

λe−µT−µe−λT

λ−µ ift=T.

(4.11) By integration, we obtain that the cumulative distribution functionF for the age of the last individual givenYT =yT takes the form

F(t) = λ(1−e−µt)−µ(1−e−λt)

λ−µ 1[0,T)(t) +1{T}(t) (4.12)

fort∈[0, T].

Forλ=µ >0, we proceed analogously: By (4.6), we have that

P(ZT >0|L=t) = 1−P(ZT >0|L=t) = 1−1 +λ(T −t)

1 +λT = λt

1 +λT fort∈[0, T), which leads to

f(t)∝

λe−λt1+λTλt if 0≤t < T, e−λT ift=T.

We compute the normalizing constant off as e−λT +

T

Z

0

e−λt λt

1 +λTdt=e−λT + 1

1 +λT −e−λT = 1 1 +λT. This yields thatf is given by

f(t) =

λ2te−λt if 0≤t < T, (1 +λT)e−λT ift=T.

(4.13) By integration, we obtain that the cumulative distribution functionF takes the form

F(t) = (1−e−λt(1 +λt))1[0,T)(t) +1{T}(t) fort∈[0, T].

Plugging the above expressions for F and F into Equation (4.3) yields the statement of

Theo-rem 4.3.1.