Lagrangian formulation - Mathematical formulation of the identification problemidentification p

3. Mathematical formulation of the identification problemidentification problem

3.5. Lagrangian formulation

In contrast, the variates from the Gaussian process prior exhibit a clear correlation.

This correlation is induced by the spatial correlation of the squared exponential kernel (`= 3.0). The resulting signals are fluctuating with a certain spatial frequency controlled by the scale`. The smoothness prior and the TV prior represent a different interpretation of ‘smoothness’. The smoothness-prior (σ_s² = 0.025) imposes a smooth structure in the sense of smooth differentiability. Since it favors signals with low ‘second derivative’, likely variates will vary smoothly and at low frequency. In contrast, the TV prior (α_tv = 1.0, ² = 0.001) favors signals with bounded variation. This allows variates to have a much more flexible and irregular structure. In particular, it results in likely samples to be allowed to have certain distinct features, but the amount of such features is bounded.

3.5. Lagrangian formulation

Popular approaches for the numerical exploration of the posterior are introduced in detail in chapter 4. Nevertheless, one aspect of the so called maximum a posteriori (MAP) solution is already introduced at this point since it involves a reformulation of the inverse problem. It is therefore rather associated to the setup of the inverse problem than to its numerical solution. By defining

J(U(θ),θ) := 1

2σ²D(Z, F(θ))−logp(θ), (3.94) c.f. chapter 3.1.2, it can be seen that the maximization of the posterior (3.23) is obtained by

argmax

p(Z|θ)p(θ)≡argmin

θ J(U(θ),θ). (3.95)

Given enough regularity of the posterior, efficient numerical approaches almost always make use of gradient information ^dJ_dθ to compute the solution to (3.95). But due to the complex relation θ7→F(θ), the computation of this gradient information is usually accompanied by a significant computational cost. In the low-dimensional regime, it is often feasible to compute approximations of the gradient ^d_dθ^J by means of finite differences (FDs) via

dθ_i = d(−logp(θ1, . . . , θi, . . . , θn|Z))

dθ_i (3.96)

≈ −logp(. . . , θi+ ∆i, . . .|Z) + logp(θ|Z)

∆_i , (3.97)

whereby ∆_i represents an incremental variation in the i-th component of the vector of parameters θ. Beside the fact that the accuracy of FD methods is bounded by cancel-lation and round-off errors in the floating-point arithmetic [137], it is highly inefficient in a high-dimensional regime. This inefficiency results from the fact that the necessary number of evaluations n_eval of the posterior is directly coupled to the dimension of the parameters according to

n_eval =n_p+ 1. (3.98)

Given that gradient based optimization schemes can easily take in the order of 100-1000 iterations to converge to an optimum, such a computational cost per iteration is prohibitive for many applications.

In the context of PDE-constrained optimization, a remedy to this problem is provided by adjoint approaches. These can be derived from a Lagrangian formulation of the inverse problem. To highlight its properties, the derivation starts from a description using a continuous formulation:

L˜(U(θ), θ) := 1

2σ²D(Z,F˜(θ)) +R(θ), (3.99) whereby the modelF˜ := (C◦A)(θ), withF˜ :X → Z, is the continuous counterpart to the modelF. This formulation can be extended by the weak form (2.71) in the sense of a constraint to give theLagrangian:

L(U(θ),λ, θ) := ˜L(U(θ), θ) +δW(U,λ, θ). (3.100) Thereby, the test functionsδU are replaced with the symbolλ, to avoid confusion of the test functionsδU with the variation of the displacementsδθU due to a variation in the parameters. Given the solutionU=A(θ), the weak form is always evaluated to

δW(U,λ, θ) = 0 ∀θ. (3.101) Consequently, the following equivalence holds:

argmin

L˜(U(θ), θ)≡argmin

θ L(U(θ),λ, θ), ∀λ. (3.102) Given (3.101), it must further hold that

∆δW(U,λ, θ)[δθ] = 0 ∀δθ. (3.103) This guarantees the equivalence

δL˜(U(θ), θ)[δθ] =δL(U(θ),λ, θ)[δθ] ∀λ, δθ. (3.104) With the definition of the Gateaux derivative ^δΦ_δv of some functionalΦ(v)defined by the duality pairingh^δΦ_δv, δvi:=δΦ(v)[δv][see e.g. 89], the equivalence (3.104) directly implies

δL˜ δθ = δL

δθ. (3.105)

In contrast to classical constrained-optimization, the Lagrange parameters λ are free parameters in the Lagrangian formulation (3.100). This degree of freedom can be utilized to efficiently compute the derivative ^δ_δθ^L. To this end, the variationδL(U(θ),λ, θ)[δθ]is expanded to

δL(U(θ),λ, θ)[δθ] = 1 2σ²h δD

δ_θU, δ_θUi +hδR

δθ, δθi

+ ∆_UδW(U,λ, θ)[δ_θU]

+ ∆_θδW(U,λ, θ)[δθ] ∀λ, δθ. (3.106)

3.5. Lagrangian formulation Since the variationδ_θUcan only be computed through the nonlinear solution of (2.101), a straight-forward computation of (3.106) is difficult. However, by using the free choice inλ, the computation ofδ_θU can be avoided by choosing λsuch that

1 2σ²h δD

δ_θU, δ_θUi+ ∆UδW(U,λ, θ)[δ_θU] = 0 ∀δ_θU. (3.107) Therein, the second term is readily identified with the differential virtual work (2.90).

Since the use of the test functions λ and the variation δ_θU is exchanged with respect to the differential virtual work (2.90), (3.107) is referred to as adjoint equation in the sense of (2.96). The solution is usually obtained under the same discretization as the incremental forward problem. In this context, well-posedness is given by the application of the boundary conditions (2.75) for the dofs δd describing the Lagrange parameters λ. For the evaluation of the first term in (3.107), it is noted that the dependency of the similarity measure on the displacements is given by D(Z, C(U)) = D(Z,F˜(θ)). Using (3.29), it can be seen that the right hand side of the linear adjoint equation is given by

− 1 2σ²hδD

δ_θU, δ_θUi= 1

σ²hZ−C(U), δC(U)[δ_θU]iZ. (3.108) For similarity measures in the sense of point-wise displacement measurements such as (3.37) or (3.38), the observer variationδC(U)[δ_θU]is trivial since the observation oper-ator is just a linear operoper-ator and thus

δC(U)[δ_θU] =Cδ_θU. (3.109)

For similarity measures in terms of surface currents such as (3.58), the observation is defined through the push-forward action (3.50). This leads to a more involved definition which is detailed in appendix C.1.

Upon inserting the solutionλof the adjoint equation (3.107) into (3.106), the variation of the Lagrangian δLis obtained as

δL(U(θ),λ, θ)[δθ] =hδR

δθ, δθi+ ∆_θδW(U,λ, θ)[δθ] ∀δθ. (3.110) These remaining terms can usually be evaluated economically due to the explicit depen-dencies on the parametersθ. Whereas for the regularization this evaluation depends on the specific choice of prior, for the virtual work the formal application of the directional derivative results in Due to the explicit parametrization, see chapter 2.4, the variation of the stressesδθS[δθ]

and the variation of the strains∆_θδE[δθ]are given by δ_θS[δθ] = ∂S

∂θδθ, (3.112)

∆_θδE[δθ] = ∂δE

∂θ δθ. (3.113)

The functional description of these terms is specific to the physical meaning modeled by the parametersθ. With respect to the application of arterial growth modeled by (2.65), the detailed linearization is provided in appendix C.2.2.

Finally, to arrive at a discrete gradient of the negative log-posterior (3.94), the contin-uous parameters θ are represented in terms of the element-wise basis (2.105) such that δθ=G^>δθ. The differentialδL[δθ]ˆ is then given by Comparing this to the definition of the Gateaux derivative, the discrete gradient is ob-tained as Assuming that the regularization was defined by any of the prior-models p(θ)presented in chapter 3.4 such that

In contrast to the approximation (3.96), the gradient (3.115) can be computed by the solution of one additional linear problem (3.107). This process is independent of the dimension of the parameter vector θ. This independence represents a huge advantage compared to the FD approximation. Furthermore, for a certain discretization of the forward problem, the gradient (3.115) can in principle be evaluated exact up to machine precision. Prerequisite is the exact solution of the adjoint equation which is dependent on the solution U =A_h(θ) obtained through the nonlinear solution (2.84). In contrast to the nonlinear solution ipso facto, the convergence tolerance of the nonlinear solution process in combination with an adjoint formulation is not governed by the desired ac-curacy of the primal solution U but by the accuracy of the gradient ^dL_dθ. Whereas an analytic assessment of this accuracy is difficult, it is revealed by practical application that the accuracy of the nonlinear solution of the forward problem can have a signifi-cant influence on the accuracy of the gradient. In general, the convergence tolerance for the nonlinear solution has to be chosen with respect to machine precision to obtain the gradient as accurate as possible. If the adjoint equation is solved by means of iterative methods, the same arguments hold for the convergence of this iterative solution.

4. Numerical solution of the

Im Dokument Bayesian Calibration of Nonlinear Cardiovascular Models for the Predictive Simulation of Arterial Growth (Seite 69-73)