• Keine Ergebnisse gefunden

based upon earlier work by N¨uesch (2010) for discrete stochastic processes. Our likelihood function is similar to the likelihood function published recently by Thompson (2012) but assumes that cells are drawn from the population with replacement. While this is biologically not accurate, the effect of missing replacement is negligible as the size of the overall cell population is much larger than the number of drawn cells. The advantage of our likelihood function is that the different pieces of information contained in binned histogram data, namely the size of the population and the label distribution, are separated. Furthermore, we account for outliers, which increases flexibility as well as statistical robustness.

To efficiently evaluate the likelihood function, we have developed an approach using only a partial simulation of the system. Therefore, we have exploited the invariance of log-normality under the system dynamics and have derived a tight approximation of the measured fluores-cence, consisting of label-induced fluorescence and autofluorescence. By intertwining model simulation and likelihood function evaluation, a significant speedup is achieved. Still, as the posterior distribution might be multimodal, efficient sampling methods are required. We suggest the usage of annealed sampling in combination with adaptive MCMC with delayed rejection for the individual updates.

To assess the properties of the modeling and estimation framework, a published dataset of T lymphocytes (Luzyaninaet al., 2007b) has been reanalyzed. For this dataset, we com-pared competing model hypotheses, including constant as well as time- and division number-dependent proliferation rates. Our model assessment using the Bayesian information criterion indicates that the rates of cell division probably depend on time and division number, while the rates of cell death merely depend on the division number. Dependencies of the prolifera-tion rates on the label concentraprolifera-tion – used by Luzyaninaet al.(2009, 2007b) and by Banks et al.(2011, 2010, 2012) – are not required when considering division number induced het-erogeneity. This is in agreement with results by (Thompson, 2012) who also compared diff er-ent parameterization of the DLSP model. However, we provided the first rigorous statistical assessment using a realistic likelihood function. In previous publications merely least-squares and generalized least-squares are used as distance measures for parameter estimation. Both do not allow for a statistical interpretation for this model class.

A further disadvantage of most existing publications is that local optimization methods and tailored model for the unknown functions are used. Instead, in this work, global exploration of the parameter space via MCMC sampling is combined with highly flexible parameteriza-tion. This MCMC sampling based approach allows for a direct assessment of the parameter and prediction uncertainties. Furthermore, the flexible parameterization uncovers a delayed cell division which provides a strong indication of a minimal time between cell divisions. To consider this minimal time between subsequent cell divisions, die DLSP model can be com-bined with age-structured population models. While such work is in preparation (Metzger et al., 2012), it is beyond the scope of this thesis.

To sum up, in this chapter we introduced the DLSP model, which accurately accounts for label dynamics and division number-dependent cell-to-cell variability. To assess the prolif-eration properties from labeling experiments, a Bayesian estimation methods has been in-troduced. To the best of our knowledge, this is the first Bayesian framework for structured population models of this type. This is probably the result of the high computational com-plexity associated with the evaluation of likelihood functions, which we have circumvented here.

5.1 Summary and conclusions

In this thesis, signal transduction in and proliferation of heterogenous cell populations has been investigated. For both processes, we provide novel modeling, parameter estimation, and uncertainty analysis methods. For the parameter estimation, merely single cell snapshot data are employed instead of the rarely available single cell time-lapse data used in most previ-ous publications. As the study of cell populations is in general computationally demanding, throughout this thesis, we focused on efficient simulations and estimation procedures which make use of the respective problem structure. Therefore, we use tools from the fields of stochastic modeling, statistics, control engineering, and nonlinear optimization.

To study signal transduction in the presence of stochastic and deterministic cell-to-cell vari-ability, an augmented Fokker-Planck equation is derived, governing the population dynamics.

On the other hand, proliferating populations are described by a division- and label-structured population model. While the considered model classes are vastly different, the rigorous dis-tinction between state dynamics and the measurement process allows for both, the derivation of partially analytical solutions and efficient simulation schemes. Especially the consideration of the superposition principle and marginalization of the state dynamics, as well as decom-position methods and invariance principles allow for simulation methods which reduce the curse of dimensionality.

The model properties also provide the basis for sophisticated Bayesian parameter estima-tion approaches. In Chapter 3, a parametric approach for the inference of populaestima-tion het-erogeneity has been introduced. This parametric approach enables the decomposition of the parameter estimation into two phases. During the pre-estimation phase, an exact paramet-ric model of the likelihood has been constructed, which can be evaluated efficiently. This parametric model of the likelihood has been used in the second phase for optimization and uncertainty analysis. Both tasks are simplified by using a parametric model of the likelihood, which renders the application of sequential convex optimization and efficient MCMC sam-pling feasible. This in turn allows for a rigorous analysis of the model and of the prediction uncertainties, and the detection of non-identifiability and sloppy parameters.

While the two-step procedure is rather efficient, the applicability of the approach is limited by the dimensionality of the parameter space. As an affine parameterization of the distribution is required, common ansatz function choices are probably restricted to three or four param-eters. To push this limit, recursive refinement methods may be used. However, also these methods have limits. In addition, it is known that, while some parameters may vary among cells, e.g., protein synthesis rates, others are identical in all cells, e.g., affinities. The current approach does not allow for a simultaneous estimation of the distribution of the cell-specific parameters and the value of the shared parameter. Despite these shortcomings, the proposed scheme extends previous results and the example in Section 3.4 illustrates its benefits in case of a small number of unknown parameter distributions.

Analoguously to parameter estimation for the augmented Fokker-Planck equation, we have

proposed in Chapter 4 a parameter estimation scheme for the division- and label-structured population model which exploits the model properties. While no parametric model of the likelihood function could be derived, we circumvent the need to simulate the overall PDE model to evaluate the likelihood function. Instead, the likelihood is assessed by exploiting the decomposition and invariance principles, which yields an approach only requiring the simulation of a low-dimensional ODE model. Accordingly, the likelihood function can be evaluated extremely efficiently, allowing for the investigation of the posterior distribution.

This yields global Bayesian confidence intervals for parameters, unknown functions (which are parameterized), as well as model predictions.

Despite the computationally efficient evaluation of the likelihood function, the remaining parameter estimation problem is still challenging. It is nonlinear and the posterior distribution may possess different modes, rendering the application of global optimization and uncertainty analysis schemes essential. While sophisticated MCMC sampling algorithms, employing an-nealing, adaptation and delayed rejection, can partially overcome this problem, for increasing problem dimensionality the convergence rate decrease. Therefore, it is crucial to find flexible parameterizations of the unknown functions which possess a small number of parameters. As shown in the example in Section 4.4, flexibility is crucial to avoid misleading interpretations and to allow for an unbiased analysis.

Regarding methodology and theoretical concepts: The methods used for the analysis of the individual model classes are similar and might be used to study other population models as well. However, their application and the final implementation is highly problem-specific. As for single cell models, different formulations of the estimation problem are used depending on the considered model classes, e.g., Markov jump process, chemical Langevin equations, and reaction rate equations. Beyond this, the formulation strongly depends on the type of available measurement data.

In summary, the methods proposed in Chapter 3 and 4 of this thesis constitute a significant progress towards uncertainty-aware computational modeling of heterogeneous cell popula-tions. A variety of the developed ideas and concepts are rather general and might be applica-ble to other classes of population models, data types, and biological questions.