Related work - Sparse Variational Bayesian algorithms for large-scale inverse problems with app

Therefore, in order to correctly infer model parameters it is essential to incorporate all the existing uncertainties, when solving inverse problems.

Solving the inverse problem and quantifying uncertainties is a subject of the re-search field of Bayesian inference. Bayesian formulations offer a rigorous setting for their solution as they account for various sources of uncertainty that is unavoidably present in these problems. Furthermore, they possess a great advantage over deter-ministic alternatives: Apart from point-estimates they provide quantitative metrics of the uncertainty in the unknowns, encapsulated in the posterior distribution [48].

Nonetheless, the solution of model calibration problems in the Bayesian framework is hampered by multiple difficulties:

Firstly, for high-dimensional problems an exuberant number of computationally expen-sive forward calls are required, which poses a prohibitive computational burden.

Secondly, multimodal probability distributions offer many local maxima, which is dif-ficult for algorithms to correctly identify. For example, Markov chains get caught for extended periods of time in local maxima or local approximations schemes are only able to record a single mode.

Thirdly, an obstacle which is, despite its importance, usually ignored is model inade-quacy. It is often assumed that the model, for example used for calibration, is perfect.

This leads to misidentified model parameters or, even worse, wrong predictions. This thesis addresses these major challenges and proposes a novel, efficient and accurate framework. Before deriving details about the main work contributions, a short review of related work is outlined.

1.2 Related work

Computational efficiency and dimensionality: Solving large-scale inverse problems is computationally very expensive, if not an intractable process. Finding a solution for an inverse problem with standard Markov Chain Monte Carlo (MCMC, [49]) techniques requires an exorbitant number of likelihood evaluations in order to converge, i.e., so-lutions of the forward model [50, 51, 52, 53], Section 2.2.3. The large number of required forward calls originates from the poor scaling of traditional Bayesian inference tools with respect to the dimensionality of the unknown parameter vector - another instance of thecurse-of-dimensionality [54]. As each of these calls implies the solution of very large systems of (non)linear equations, those approaches are usually imprac-tical for high-dimensional problems. In problems, such as the elastography example, the model parameters of interest, i.e., material properties, exhibit spatial variability which requires fine discretizations in order to be captured. Consequently, the solution of large-scale inverse problems critically depends on methods to reduce computational cost. Several authors, such as T. Bui-Thanh, T. Cui, O. Ghattas, N. Petra, Y.M.

Marzouk, G. Stadler, L.C. Wilcox and many more, work on different efficient methods, either achieved by reducing the number and/or the cost of a single required forward call, which will be briefly summarized.

Advanced sampling schemes, like adaptive MCMC [55, 56, 57] and Sequential Monte Carlo (SMC, [58, 59, 60]) exploit the physical insight and the use of multi-fidelity solvers in order to expedite the inference process. The use of first-order derivatives, Hessian [61] or low-rank structure of the Hessian [62, 63] to design effective proposal distributions has also been advocated either in a standard MCMC format or by de-veloping advanced sampling strategies [64]. These are generally available by solving appropriateadjoint problemswhich are well-understood in the context of deterministic formulations. Nevertheless, the number of forward calls can still be in the order of tens of thousands if not even higher.

Several propositions have also been directed towards using emulators, surrogates or reduced-order models of various kinds [65, 66, 67, 68, 69, 70, 71, 72, 73]. The forward model is replaced with an inexpensive surrogate to dramatically decrease the computational cost of a forward call. However, such a task is severely hindered by the high-dimensionality. More recent methods attempt to exploit the lower-dimensional structure of the target posterior where maximal sensitivity is observed [74, 75, 76, 73, 77, 78]. This enables inference tasks carried out on spaces of significantly reduced dimension and are not hampered by the aforementioned difficulties. Generally, all such schemes construct approximations around the maximum a posteriori (MAP) point by employing local information, e.g., based on gradients, and are therefore not suitable for multimodal or highly non-Gaussian posteriors.

An alternative to sampling approaches are non-empirical approximation schemes, such as Variational Bayesian (VB) [79, 54], see Section 2.2.4, which reduces the number of expensive forward calls. Such methods have risen into prominence for probabilistic inference tasks in the machine learning community [80, 81, 82] but have recently also been employed in the context of inverse problems [83, 84]. They provide approximate inference results by solving an optimization problem over a family of appropriately selected probability densities with the objective of minimizing the Kullback-Leibler di-vergence [85] with the exact posterior. The success of such an approach hinges upon the selection of appropriate densities that have the capacity of providing good approx-imations while enabling efficient and preferably closed-form optimization with respect to their parameters. Based on the great advantages of Variational Bayesian frameworks advanced VB strategies are employed in this thesis resolving the remaining challenges.

We note that an alternative optimization strategy, originating from a different perspec-tive and founded on map-based representations of the posterior, has been proposed in [86].

Multimodality: Multimodal posteriors are not only a challenge for local approx-imation schemes but also for standard MCMC methods. Multimodality often causes

1.2 Related work

mixing problems as the Markov chain is trapped in minor modal areas for long periods of time. This is especially exacerbated for higher dimensions. Different advanced in-ference tools [87, 88, 89, 90], such as those based on simulated annealing, annealed importance sampling or nested sampling, have been developed. However, they require a very large number of forward model calls, increasing with the number of unknowns.

Alternatively, different mixture models have been developed in various statistical inference applications, e.g., speaker identification [91], data clustering [92], and also in combination with Variational Bayesian inference techniques [93, 79, 94]. Neverthe-less, all of these problems are characterized by inexpensive likelihoods, low-dimensional problems and multiple data/measurements. In this thesis, a model is developed that overcomes existing problems with a mixture of Gaussians within an advanced novel Variational Bayesian framework. It is able to solve computationally expensive, high-dimensional problems with a physical collection of data in a single test/experiment.

Model inadequacy: Another challenge, which has so far barely been accounted for, is model inadequacy. In most model calibration studies, it is implicitly assumed that the model is perfect. However, physical systems are very complex and simple mathematical models are used to approximate the reality. The Bayesian framework al-lows the comparison of different models for model selection, e.g., with Bayes factor [95]

or information criteria [96, 97]. Nonetheless, none of these methods explicitly quantify the model error, nor do they provide a predictive uncertainty that is representative of the extent of the model error. They merely compare different models with each other.

Different approaches [46, 98, 99, 47] to quantify the model error explicitly model the model error as an additive term to the model outcome, e.g., by a Gaussian process. In the view of the fact that he discrepancy model is posed only on the observables quanti-ties it is fine-tuned with respect to these observations. Thus, it does not provide much physical insights on model error and does not significantly improve the predictive ca-pabilities of the model [34]. In addition, it gets entangled with the measurement errors and a disambiguation of model and data error is difficult. Moreover, multiple physical experiments are required and it becomes problematic for high-dimensional problems.

In contrast to that, Berliner [100] was one of the very first who embedded an additive term within a submodel to quantify model error for an example with a small number of unknown model parameters. This approach of embedding the model error in a submodel has been extended in the field of fluid dynamics for large-scale problems.

More specifically, within approximate turbulence models, such as RANS, Boussinesq approximation, Spalart-Allmaras (SA) or the k − ω turbulence model an additional term, e.g., Gaussian process, is added within the approximate model [101, 102, 103, 104, 105, 106]. The model error for a specific problem is then identified by comparing the outcome of the approximate model with the outcome derived by direct numerical simulation (DNS). It relies on the assumption that the outcome of the DNS, which is computationally very expensive to derive, is the ’true’ outcome [107, 104]. In those

cases, only the model error is treated as an unknown and no further latent or model parameters are quantified. This transfers the problem to a model calibration problem where the model error can be interpreted as a model parameter.

In this thesis, a new developed strategy of identifying model inadequacy is presented.

This strategy is based on a framework by Koutsourelakis [108] but extends it with a consistent derivation of the normalization term, such that the integration of flexible prior assumptions is possible. The intrusive framework unravels the forward problem, which enables us to assess constitutive model error directly without knowing the true model. In contrast to existing work, the quantification of the model parameters and the model error is at the same time possible making inference for high-dimensional problems feasible.

Im Dokument Sparse Variational Bayesian algorithms for large-scale inverse problems with applications in biomechanics (Seite 22-25)