• Keine Ergebnisse gefunden

A posteriori dimensionality reduction under TV prior assumptionsassumptions

predictive modeling of AAA growth

5.2. A posteriori dimensionality reduction under TV prior assumptionsassumptions

In this chapter, a novel PCA-like approach [see e.g. 108] is suggested. It uses a decom-position

θ=θ0+P·ψ (5.1)

with thereduced dimensional parametersψ ∈Rnr and thedictionaryP∈Rnp×Rnr. The approach is based on the assumption that a parametrization (5.1) with nr << np can be found. The dictionaryPis comprised of pair-wise orthonormal columns representing a basis for the reduced dimensional space Rnr such that P>P =I. This setup defines a linear monotonic mapping hP : ψ 7→ θ. From a probabilistic point of view, it is most important that this mapping captures the main covariance structure in the high-dimensional layout. Assuming that this covariance structure is sufficiently represented by the Laplace approximation, a factorization of the associated covariance matrix would yield the desired basis, as discussed by Bui-Thanh et al. [32]. However, the resulting dense np×np covariance matrices cannot be handled in the large-scale regime and methods to approximate the action of the covariance matrix are usually applied. Hence, the quality of the basis would be directly coupled to the quality of the approximation of the covariance matrix. For instance, the requirements on the covariance approximation are relatively low to still obtain good convergence in the LBFGS scheme that is applied throughout this thesis. As a consequence, a factorization of the resulting approximate covariance might not be optimal.

In order to decouple the numerical solution and the dimension reduction, a different approach is proposed here. Motivated by the work of Babacan et al. [10], a local quadratic approximation of the TV functional (3.88) is utilized. Using the inequality1

√v ≤ v+z 2√

z ∀z (5.2)

with√

v= (v+z)/(2√

z) ⇐⇒ v=z and settingv=P

jWGi,j(xj−xi)2+2, it can be seen that the TV norm is bounded by

T Vw(x)≤X

i

P

jWGi,j(xj−xi)2+2 +zi

√zi . (5.3)

1This inequality is easily verified from linearization of

varoundz.

5.2. A posteriori dimensionality reduction under TV prior assumptions This bound is quadratic in x and as such allows for a representation in terms of a symmetric operator Ltv(zi) such that

T Vw(x)≤x·Ltv(zi)·x+X

i

2+zi

√zi +const (5.4)

Given the eigendecompostion

Ltv(zi) =YΛY> (5.5)

with the matrix of eigenvectors Y and the matrix of eigenvalues Λ, a reduced basis P can be constructed by

P=Ynr, (5.6)

where Ynr represents the columns of Y corresponding to the nr smallest eigenvalues.

Given that the TV prior represents a class of desirable solutions, the rationale behind this choice is that the quadratic approximation (5.4) is still capable of representing the characteristics of these desirable solutions. The dominant properties should then be captured by the eigenvectors corresponding to the small eigenvalues since these represent modes of least action for the operator Ltv(zi). See figure 5.6 for an illustration of this idea in practice. Since the operator (5.5) depends on the point of linearization, the factorization (5.5) is unique only up to the choice ofzi. A natural choice is to set zi=v and to evaluatev at the MAP estimate. This choice then suggests a 2-stage approach:

1. Find the MAP estimate ˆθmap and the associated Laplace approximation (4.15).

By using a Quasi-Newton method such as the LBFGS method, the covarianceH1 will only be available in terms of its approximate action on some vector v. This aspect is notationally accounted for by using the symbol H˜1v in the following.

Anyhow, this product is never computed in terms of a matrix-vector product but in terms of the two-loop-recursion algorithm 2, see chapter 4.4.1. Set θ0 = ˆθmap. Find a reduced dimensional basis given by (5.4)-(5.6). Initialize ψ=0.

2. Find the PM and the posterior variance, or any other posterior statistic, by MC techniques such as SMC with a parametrization given by (5.1). Due to the mono-tonic behavior of the linear one-to-one mapping hP defined by (5.1), the posterior p(ψ|Z)is given by the change of variables for probability densities

pθ(θ|Z) =|P|pψ(ψ|Z)∝p(Z|hP(ψ))p(hP(ψ)). (5.7) Again, since the change in functional dependency is expected to be clear from the argument, the labeling indices are omitted resulting in p(θ|Z)∝p(ψ|Z). The two important ingredients for a successful application of the SMC algorithm are the initial distributionπ0 and an appropriate choice of transition kernels Kn. For the initial distribution the already available Laplace approximation projected to the reduced dimensional parameters is a natural candidate. With the linear transfor-mation (5.1), this projection is given by

π0(ψ) =N(0,P>1P). (5.8) Again, the product H−1P := P>−1P is not computed as a dense matrix-matrix product but column-wise, which is feasible sincenr<< np. Additionally, assuming

that the Laplace approximation reasonably captures the dimensions of the true posterior, it can be made use of as a proposal density in the sense of (4.72) according to

q(x|y) =N(y, σPHP1). (5.9) Thereby σP is an adjustable scaling parameter such that an optimal acceptance ratio of the transition kernel is obtained.

The proposed approach incorporates several beneficial properties:

• Step 1 of the approach can be achieved by state-of-the-art large-scale optimization.

Quasi-Newton methods, routinely resulting in approximate Laplace approxima-tions, can thereby additionally be used to initialize step 2. This approximation will have a direct effect on the efficiency of step 2. Anyhow, the dimensionality reduction relies solely on the existence of the MAP estimate which can be found by a great variety of different algorithmic setups.

• As can be seen from (5.3), the sparsity pattern of the linear operatorLtv is entirely defined by the spatial connectivity of the parameters θ. E.g., for an element-wise layout, the sparsity pattern is defined by the connectivity of the elements of the FE mesh. Depending on the specific definition of this connectivity, see appendix F, this results in extremely sparse graph structures. Thus, the eigendecomposition can be performed by standard large-scale sparse eigenanalysis [see e.g. 6].

• The Laplace approximation, which cannot be considered a good approximation in general [25, 144], is only used in the sense of preconditioning for the second step.

So even in situations where the Laplace approximation is not suitable to be used as direct importance distribution for the posterior, globalization strategies like SMC can make use of it in a meaningful way.

5.2.1. Patch-wise approximations

In the setting of TV prior assumptions, the representation of a point estimate such as the MAP solution in terms of a priori patches seems to be justified at first sight. Such a patch-wise solution can also formally be represented by (5.1). Let a patch be defined by a connected set of elements Ki ={Ek}n

Ki ele

k=1 of the meshK, see chapter 2.3, such that

• Ki ⊂ K,

• K=SNp

i=1Ki,

• Ki∩ Kj =∅ ∀i6=j,

with the number of patches Np and the number of elements nKelei per patch Ki. In this case, the columns of the dictionaryP= [p1, . . . ,pnr]are comprised of the vectors

pi={v∈Rnp :vk= 1/

q

nKelei ⇐⇒ Ek∈ Ki, vk= 0 ⇐⇒ Ek6∈ Ki}. (5.10) Obviously, an a priori definition of such a patch-wise basis is only possible if the patch size is chosen small enough so that, justified by prior knowledge, the spatial variations of