• Keine Ergebnisse gefunden

Application example: Proliferation of T lymphocytes

Parameter and prediction uncertainties

Given a representative sample, S = {θk}dkS=1, parameter and prediction uncertainties can be determined. Similar to Section 2.3.2, the 100(1−α)% Bayesian confidence intervals for the parameters is

∀j: [θj](1−α) =[θ(α/2)j , θ(1−α/2)j ], (4.85) with θ(α)j being the 100α-th percentile of π(θi|D). The prediction uncertainty on the other hand is, dependent on the property of interest, a function of time and/or label properties (see Section 3.3.5). In the following, we study the confidence interval ofn(¯y|t, θ),

∀t,y¯ ∈R+ : [n](1−α)(¯y|t)=[n(α/2)(y|t),n(1−α/2)(¯y|t)], (4.86) andNi(¯y|t, θ),

∀i∈N0,t∈R+ : [Ni](1−α)(t)=[Ni(α/2)(t),Ni(1−α/2)(t)]. (4.87) The confidence intervals for other model properties can be defined accordingly.

Computation of maximum a posteriori estimate

Besides the evaluation of the global properties of π(θ|D), also the maximum a posteriori estimate is important, among others for model comparison or model rejection. To compute θthe nonlinear optimization problem

maximize

θ∈R+

P(D|θ)π(θ) (4.88)

has to be solved. This can be done using global optimization methods, however, given the sampleS ={θk}dS

k=1, local methods are sufficient. The parameterθ∗,S, withP(D|θ∗,S)π(θ∗,S)≥ P(D|θk)π(θk) fork ∈ {1, . . . ,dS}, is in general close to the global optimum of the posterior distribution. Hence, local optimization starting atθ∗,S will in general provide a reliable ap-proximation of the maximum a posteriori estimate. In our experience, this procedure is at least as robust as direct global optimization.

The goal of our study is to analyze the performance of the proposed approach. Further-more, different model hypotheses will be compared and the parameter and prediction uncer-tainty of the best model will be analyzed in detail.

4.4.1 Measurement data and error model

The available measurement data are CFSE histograms collected on the day of stimulation (day 0) and the five subsequent days (day 1 to day 5). In the remainder of this section, merely the data observed on day 1, day 2, day 4, and day 5 are used for parameter estimation (similar to the study in (Banks et al., 2012)). The other observations (day 0 and day 3) seem to be corrupted by measurement noise, as discussed by Luzyaninaet al.(2007b) and Bankset al.

(2011).

Unfortunately, the removal of the first measurement (day 0) results in non-identifiability of αi(t) and βi(t) for t ∈ [0,1] [d], and prevents the direct estimation of the initial condition at t= 0 [d],n0(y), from the measurement data. To circumvent both problems, we consider only the time in between the earliest and the latest used measurement in the reduced dataset, t ∈ [1,5] [d]. Accordingly, the initial condition is the distribution of label induced fluorescence att =1 [d]. This initial condition is parametrized using a weighted sum of three log-normal distributions, and the corresponding distribution parameters are estimated from the histogram measured on day 1. An analysis of the uncertainty ofn0(y) reveals that it is negligible, which is why it is not taken into account.

To compare the remaining data with the model predictions, we use

P( ¯Ntj|θ)= N(N(t, θ)|N¯tj,(0.01·N¯tj)2) (4.89) andP({H¯ltj}dl=1l |N¯tj, θ) as in (4.59), with a total outlier probability of 5%, R

R+po(¯y)dy¯ = 0.05, which is split equally among all bins. The likelihood function P( ¯Ntj|θ) has been adopted, as the pure sampling error described by (4.56) seems to underestimate the variability in the measurement of the overall population. This is also supported by the detection of the two outliers (day 0 and day 3). In the following, we assume that the measurement error for the population error is normally distributed and proportional to the populations size, with a relative standard deviation of one percent. While the error model might be questioned, no better one is available in literature. To obtain a more precise error model, additional experiments and the analysis of noise sources are necessary.

4.4.2 Model hypotheses

The precise proliferation dynamics of T lymphocytes are poorly understood. In the literature, different hypotheses have been formulated, including: constant cell division and death rates (Revy et al., 2001); time- and division number-dependent cell division and death rates (De Boeret al., 2006; Luzyaninaet al., 2007a); and time- and label concentration dependent cell division and death rates (Bankset al., 2011, 2010; Luzyaninaet al., 2009). While the latter are known to provide a good fit for the considered dataset, it is widely accepted that optimized labeling does not alter proliferation rates (Hawkins et al., 2007). Hence, the underlying assumption is biologically inconsistent.

We presume that the dependence of the proliferation rate on the label concentration is only required as the LSP model employed in the respective studies cannot account for

popula-Table 4.1: Lower and upper bounds for model parameters. The unit of time is hour.

parameter aij bijmax kT µa σ2a lower bound 10−3 10−3 10−2 10−4 10−2 10−2

upper bound 10 10 10 10 10 10

tion heterogeneity induced by different division numbers. To check this hypothesis, and to perform an in-depth analysis ofαandβ, we compare four model alternatives:

• M1:αandβare constant.

• M2:αandβare only time-dependent.

• M3:αandβare only division number-dependent.

• M4:αandβare time- and division number-dependent.

As the last model,M4, is highly complex, we also consider two simplified versions:

• M4,α(t):αis time- and division number-dependent,βis only division number-dependent.

• M4,β(t):αis only division number-dependent,βis time- and division number-dependent.

To model the time dependence ofαandβ, linear splines with six equidistant nodes are used.

A fine spacing of nodes did not result in significant improvement.

To ensure an unbiased analysis of the proliferation dynamics, we avoid the utilization of prior information. We merely restrict the parameters to a biologically plausible regime, con-sistent with earlier studies. The lower and upper bounds for the individual parameters are provided in Table 4.1.

4.4.3 Parameter estimation and model comparison

To estimate the parameters of the models, at first, the posterior distributions are explored us-ing MCMC samplus-ing. The best parameter in each sample is, in a second step, used as startus-ing point of a local minimization via theMATLAB routinefmincon. This yields a maximum a posteriori estimateθfor each model.

For the comparison of model predictions and measurement data, the measured histograms {H¯tlj}dl=1l and the predicted histograms

( Htl

j =Z

Il

n(¯y)|tj, θ)dy¯ )dl

l=1

, (4.90)

are depicted in Figure 4.7 and 4.8. It is apparent thatM1, M2, and M3 fail to fit the data.

In particular, the distributions observed on day 2 and day 4 cannot be reproduced accurately.

On the other hand,M4, which possesses time- and division number-dependent cell division and cell death rates, achieves a good agreement with the measurement data. This proves our presumption that label dependent proliferation rates are unnecessary, if the population heterogeneity induced by different division number is taken into account.

Legend:

day 1 – measurement day 1 – simulation day 2 – measurement day 2 – simulation

day 4 – measurement day 4 – simulation day 5 – measurement day 5 – simulation

autofluorescence

101 102 103

0 50 100 150

measured fluorescence ¯y

cellcountscountsperchannel

measured fluorescence ¯y

(a) Best fit of modelM1,αi(t)=αandβi(t)=β.

101 102 103

0 50 100 150

measured fluorescence ¯y

cellcountscountsperchannel

measured fluorescence ¯y

(b) Best fit of modelM2,αi(t)=α(t) andβi(t)=β(t).

101 102 103

0 50 100 150

measured fluorescence ¯y

cellcountscountsperchannel

measured fluorescence ¯y

(c) Best fit of modelM3,αi(t)=αiandβi(t)=βi.

Figure 4.7: Comparison of measurement data and fitted solution of: (a) modelM1, (b) model M2, and (c) model M3. For simulation, the maximum a posterior estimate has been used.

Legend:

day 1 – measurement day 1 – simulation day 2 – measurement day 2 – simulation

day 4 – measurement day 4 – simulation day 5 – measurement day 5 – simulation

autofluorescence

101 102 103

0 50 100 150

measured fluorescence ¯y

cellcountscountsperchannel

measured fluorescence ¯y

(a) Best fit of modelM4,α(t),αi(t) andβi(t)=βi.

101 102 103

0 50 100 150

measured fluorescence ¯y

cellcountscountsperchannel

measured fluorescence ¯y

(b) Best fit of modelM4,β(t),αi(t)=αiandβi(t).

101 102 103

0 50 100 150

measured fluorescence ¯y

cellcountscountsperchannel

measured fluorescence ¯y

(c) Best fit of modelM4,αi(t) andβi(t).

Figure 4.8: Comparison of measurement data and fitted solution of: (a) model M4,α(t), (b) modelM5,β(t), and (c) modelM4. For simulation, the maximum a posterior esti-mate has been used.

The study of the fits achieved by M4,α(t) and M4,β(t) reveals that the time and division number dependence of the cell division rates seems to be crucial – M4,α(t) fits the data –, while division number dependence of the death rates is not sufficient. This is also supported by the model comparison using the Bayesian information criterion (BIC) (Schwarz, 1978),

BIC=−2 log (π(θ|D))+dθlog







dD

X

j=1

tj







. (4.91)

The BIC value calculated forM4,α(t)is lowest (see Table 4.2), though the achieved likelihood value of modelM4is lower. This indicates that modelM4is overly complex.

4.4.4 Parameter and prediction uncertainty

To gain further insight into the proliferation process, M4,α(t) has been analyzed in detail.

Therefore, the MCMC sampling is run till the chain converged (geweke value of 0.7), which took several weeks on a standard machine. It turns out, that the parameters ˜kmax, kT, µa, σ2a, and{βi}Si=1can be estimated quite precisely, while many entries of{{aij}djai=1}Si=1 are highly uncertain (results not shown).

Interestingly, the uncertainty in{{aij}djai=1}Si=1is highly structured. For low division numbers (i = 0 and i = 1), coefficients aij corresponding to nodes of the spline with τj > 3 [d]

are uncertain. In contrary, for higher division numbers (i = 4 and i = 5), coefficients aij corresponding to nodes of the spline with τj < 2 [d] are uncertain. This finding can be explained easily by noticing thatN(0|t, θ) andN(1|t, θ) is small fort > 3, whileN(4|t, θ) and N(5|t, θ) is small fort < 2 (Figure 4.9). Hence, the uncertain coefficients are multiplied by small numbers – the effective division rate is αi(t)N(i|t, θ) – and therefore do not have any effect on the model response. Beyond the identifiability problems for{{aij}dj=ai1}Si=1, this also explains the small prediction uncertainties isn the subpopulation sizes,N(i|t, θ) (Figure 4.9), and the effective division rate,αi(t)N(i|t, θ) (Figure 4.10).

Besides the tight confidence intervals for N(i|t, θ) and αi(t)N(i|t, θ), Figure 4.9 and Fig-ure 4.10 show that the estimated effective division rates, αi(t)N(i|t, θ), are always slightly delayed compared to the size of the subpopulation N(i|t, θ). This seems to result in the best agreement of model predictions and data. Biologically, this delay might be interpreted as a minimal time between two divisions. Though this is biologically plausible, it has not been observed in previous studies. The reason might be that such unexpected findings crucially rely on a flexible parameterization of the model, which in turns requires Bayesian methods to study uncertainties and to determine valid predictions.

Due to the complexity of T lymphocytes proliferation and proliferation assays, such so-phisticated Bayesian methods have not been available so far. The main limitation has been the computation effort associated to the simulation and the evaluation of the likelihood func-tion. Our approach reduced the computational complexity, which enabled the exploration of the posterior distribution. However, despite the speed-up, the computational effort remains large as the parameter space is high-dimensional. This shows that the proposed modeling and likelihood function evaluation scheme is promising, but improved MCMC samples are required to ensure faster mixing of the chain.

Table 4.2: Comparison of model alternatives and the achieved value of the Bayesian informa-tion criterion. The best value is obtained for modelM4,α(t), which is highlighted.

properties M1 M2 M3 M4,α(t) M4,β(t) M4

division number-dependentα − − X X X X

time-dependentα − X − X − X

division number-dependentβ − − X X X X

time-dependentβ − X − − X X

BIC (104) 3.69 3.44 3.21 2.57 3.17 2.60

1 2 3 4 5

0 1 2x 104

time [day]

N(0|t)[#]N(0|t)

1 2 3 4 5

0 1 2x 104

time [day]

N(1|t)[#]N(1|t)

1 2 3 4 5

0 1 2x 104

time [day]

N(2|t)[#]N(2|t)

1 2 3 4 5

0 1 2x 104

time [day]

N(3|t)[#]

timet[day]

N(3|t)

1 2 3 4 5

0 1 2x 104

time [day]

N(4|t)[#]

timet[day]

N(4|t)

1 2 3 4 5

0 1 2x 104

time [day]

N(5|t)[#]

timet[day]

N(5|t)

Figure 4.9: Predicted number of cells withicell divisions, N(i|t) [#], i = 0, . . . ,5. The col-ored regions indicate the 80% (), 90% (), 95% (), and 99% () Bayesian confidence intervals.

1 2 3 4 5

0 1 2x 104

time [day]

αi(t)·N(0|t)[#/day]α0(t)N(0|t)

1 2 3 4 5

0 1 2x 104

time [day]

αi(t)·N(1|t)[#/day]α1(t)N(1|t)

1 2 3 4 5

0 1 2x 104

time [day]

αi(t)·N(2|t)[#/day]α2(t)N(2|t)

1 2 3 4 5

0 1 2x 104

time [day]

αi(t)·N(3|t)[#/day]

timet[day]

α3(t)N(3|t)

1 2 3 4 5

0 1 2x 104

time [day]

αi(t)·N(4|t)[#/day]

timet[day]

α4(t)N(4|t)

1 2 3 4 5

0 1 2x 104

time [day]

αi(t)·N(5|t)[#/day]

timet[day]

α5(t)N(5|t)

Figure 4.10: Predicted number of cells divisions per unit time in i-th subpopulations, αi(t)N(i|t) [#/day],i= 0, . . . ,5. The colored regions indicate the 80% (), 90%

(), 95% (), and 99% () Bayesian confidence intervals.