Solution of the minimization problem - Wave-function based approach for the RDMF 145

9. Wave-function based approach for the RDMF 145

9.3. Solution of the minimization problem

elements on the one-particle reduced density matrix ρ⁽¹⁾_α,α. If a diagonal element is integer for some α, i.e. ρ⁽¹⁾_α,α ∈ {0,1}, it follows thatρ⁽¹⁾_α,β =δ_α,βρ⁽¹⁾_α,α for this α. This can easily be seen from Gershgorin’s circle theorem [Gershgorin, 1931]

maxβ f_β ≥max

Thus, an integer diagonal element of the one-particle reduced density matrix directly implies the existence of an integer occupation. The converse is not true. With the decomposition of the one-particle basis into a core-space, active-space and virtual-space wave function in Eq. (9.17), we only need to consider the active-space wave function

|˜Ψii for the constrained minimization of the density-matrix functional. We include the expectation values of the interaction between core-states and valence states and, thus, account for core-valence exchange.

9.3. Solution of the minimization problem

9.3.1. Lagrange function

The constrained minimization problem for the density-matrix functional defined in Eq. (5.8) can be written as

F_β^W^ˆ[ρ⁽¹⁾] = min This is a mixed equality-inequality constrained non-linear minimization problem. We can write it as an equality-constrained problem by introducing the auxiliary variables x_i with P(x_i) = (1 + cosx_i)/2. (9.30) Thus, the constraint P_i ≥ 0 can be replaced by the unconstrained variable x_i and the minimization problem written in the Lagrange formalism as

F_β^W^ˆ[ρ⁽¹⁾] = min

Λ, λ_i and h are the Lagrange multipliers for enforcing the equality constraints. For ease of notation, we define the Lagrange function

L({xi},{|Ψii},Λ,{λi},{hα,β}) =

The equality-constrained non-linear minimization problem of Eq. (9.27) or Eq. (9.31) in-cludes some important conceptual and practical challenges. One can imagine two different classes of iterative minimization algorithms: one that enforces the constraints in every minimization step and one that does not require the fulfillment of the constraints in ev-ery step. The first variant can suffer from the so-called Maratos effect [Maratos, 1978].

The Maratos effect means that the algorithm can fail to rapidly converge to the solution because steps that would make good progress would violate the constraints.

The derivatives of the density-matrix functional with respect to the one-particle reduced density matrix

∂F^W^ˆ[ρ⁽¹⁾]

∂ρ⁽¹⁾_α,β (9.33)

are required for the efficient minimization of the total energy. Ignoring mathematical peculiarities for a moment, the derivatives can be obtained from the Lagrange multipliers h of the density-matrix constraint as

∂F^W^ˆ[ρ⁽¹⁾]

∂ρ⁽¹⁾_α,β =h_β,α. (9.34)

However, this is only true if the Lagrange multipliers are unique. To simplify the notation for the investigation of the uniqueness of the Lagrange multipliers, we consider the density-matrix functional with only one many-particle wave function |Ψi, i.e. P₁ = 1 . The derivative of the Lagrange function in Eq. (9.32) with respect to the many-particle wave function |Ψi is

and has to vanish for the solution. Thus, |Ψiis an eigenstate of the effective Hamiltonian

−^X

α,β

hα,βˆc^†_αcˆβ+ ˆW (9.36) with the eigenvalue λ. It is not necessarily the ground-state.

9.3. Solution of the minimization problem

The derivatives of the density-matrix constraint

∂hΨ|cˆ^†_αˆc_β|Ψi

∂hΨ| = ˆc^†_αcˆβ|Ψi (9.37) determine the uniqueness of the Lagrange multipliers. To illustrate the connection, we consider two matrices of Lagrange multipliers,h and ˜h, for the density-matrix constraint while all other variables are identical. We assume that the minimum condition is fulfilled for both matrices, i.e. Eq. (9.35) vanishes. Thus, we have the condition

α,β

h_α,β−h˜_α,βˆc^†_αˆc_β|Ψi= 0. (9.38) We show now that the uniqueness of the Lagrange multipliers, i.e. h= ˜h, is not required to fulfill the minimum condition in Eq. (9.38). For this purpose, we consider the situation where the two index-pairs (α⁰, β⁰)6= (α⁰⁰, β⁰⁰) anda 6= 0 exist such that

0 = ^X

α,β:(α,β)6=(α⁰,β⁰)∧(α,β)6=(α⁰⁰,β⁰⁰)

h_α,β−˜h_α,βcˆ^†_αˆc_β|Ψi

+h_α⁰_,β⁰ −˜h_α⁰_,β⁰cˆ^†_α0ˆc_β⁰|Ψi+h_α⁰⁰_,β⁰⁰−˜h_α⁰⁰_,β⁰⁰ˆc^†_α00cˆ_β⁰⁰|Ψi. (9.40) With Eq. (9.39) we obtain

0 = ^X

α,β:(α,β)6=(α⁰,β⁰)∧(α,β)6=(α⁰⁰,β⁰⁰)

h_α,β−˜h_α,βcˆ^†_αˆc_β|Ψi

+h_α⁰_,β⁰ −˜h_α⁰_,β⁰+ah_α⁰⁰_,β⁰⁰−a˜h_α⁰⁰_,β⁰⁰ˆc^†_α0ˆc_β⁰|Ψi. (9.41) The linear independence of ˆc^†_α0cˆ_β⁰|Ψi from other ˆc^†_αcˆ_β|Ψi for (α, β) 6= (α⁰, β⁰)∧(α, β) 6= (α⁰⁰, β⁰⁰) requires

h_α⁰_,β⁰ −˜h_α⁰_,β⁰ =−a·h_α⁰⁰_,β⁰⁰−˜h_α⁰⁰_,β⁰⁰. (9.42) Thus, the minimum condition in Eq. (9.38) can be fulfilled with h 6= ˜h if a 6= 0. More generally, the Lagrange multipliers are not unique if the gradients of the constraints

∂hΨ|cˆ^†_αˆc_β|Ψi

∂hΨ| = ˆc^†_αcˆ_β|Ψi (9.43) are linearly dependent. This is also known as the linear-independence-constraint-qualification (LICQ). In other words, if the gradients of the constraints are linearly de-pendent, then the Lagrange multipliers do not have to correspond to the derivatives of the density-matrix functional. We have only observed this issue in evaluations of the density-matrix functional for highly symmetric one-particle reduced density matrices and highly symmetric interaction Hamiltonians. In those cases, the linear dependence can be removed by adding a very small perturbation to the one-particle reduced density matrix.

9.3.3. Powell-Hestenes augmented Lagrangian

We propose to use the Powell-Hestenes augmented Lagrangian [Powell, 1969; Hestenes, 1969] for the numerical solution of the constrained minimization problem of the density-matrix functional. Consider the generic equality-constrained minimization problem

min~x f(~x) (9.44)

with the constraints

c_i(~x) = 0. (9.45)

The Lagrange function L(~x, λ_i) for this problem is L(~x,{λ_i}) =f(~x)−^X

λ_ic_i(~x). (9.46) The augmented Lagrangian L(~x,{λ_i},{µ_i}) adds a penalty function p(augmentation) to the Lagrange function and is defined

L(~x,{λi},{µi}) = f(~x)−^X

λici(~x) + 1 2

µip(ci(~x)). (9.47) µ_i are the penalty parameters. The penalty function p(c) can be chosen rather freely and properties of penalty functions have been studied in the literature. For a review we refer the reader to [Nocedal and Wright, 2006]. We choose the quadratic penalty function p(c) =c², because it has smooth derivatives and does not involve the derivatives of the constraints. The augmented Lagrangian method maps the constrained minimization to a series of unconstrained minimization problems whose solution converges to the solution of the constrained problem. A generic augmented Lagrangian algorithm first chooses initial values for the penalty parameters µ⁽⁰⁾_i , tolerance τ⁽⁰⁾, Lagrange multipliers λ⁽⁰⁾_i and a starting point ~x⁽⁰⁾. Then the following step are executed fork = 0,1,2, ....:

1. The unconstrained problem

min~x L(~x,{λ_i},{µ_i}) (9.48) is solved up to a tolerance ofk∇_xL(~x,{λ_i},{µ_i})k ≤τ^(k). The minimizer is used as

~ x^(k).

2. The convergence is checked. If the current ~x^(k) and estimates for the Lagrange multipliers λ^(k)_i satisfy the minimum conditions, then the algorithm terminates.

3. The Lagrange multipliers are updated with the first-order multiplier update

λ^(k+1)_i =λ^(k)_i −µ^(k)_i c_i(~x^(k)). (9.49) 4. New penalty parameters µ^(k+1)_i ≥µ^(k)_i are chosen.

5. A new tolerance τ^(k+1) is chosen.

9.3. Solution of the minimization problem The convergence of the augmented Lagrangian-approach is governed by the theo-rem [Nocedal and Wright, 2006]: let ~x^∗ be a local minimizer of the constrained mini-mization problem defined in Eq. (9.44) and Eq. (9.45) and let the linear-independence-constraint-qualification be fulfilled at~x^∗. Letλ^∗_i be the exact Lagrange multipliers. Then there exists a threshold value µsuch that for all penalty parameters µ_i ≥µ,~x^∗ is a local minimizer of L(~x ,{λ^∗_i},{µ_i}).

This theorem states that in contrast to the penalty method the penalty parameters in the augmented Lagrangian do not have to be increased to infinity to find the exact solution. As a consequence, the unconstrained subproblems of the augmented Lagrangian are much less prone to ill-conditioning. Additional theorems by Bertsekas [Bertsekas, 1999; Nocedal and Wright, 2006] apply to the situation of approximate solutions of the Lagrange multipliers. With the assumption of the previous theorem, there exists δ > 0, >0 and M >0 such that:

1. For all λ_i and µ_i that satisfy

k~λ−~λ^∗| ≤δmax

i µ_i and µ_i ≥µ (9.50)

the problem

min~x L(~x ,{λ_i},{µ_i}) (9.51) has the solution ~x with

k~x−~x^∗k ≤Mk~λ−~λ^∗k/µ. (9.52) 2. For all λ_i and µ_i that satisfy

k~λ−~λ^∗| ≤δmax

i µ_i and µ_i ≥µ, (9.53)

we have

k~λ^(k+1)−~λ^∗k ≤Mk~λ^(k)−~λ^∗k/min

i µ_i. (9.54)

This first theorem shows that the solution ~x of the unconstrained subproblem will be close to the exact solution~x^∗ of the constrained problem, if the estimates of the Lagrange multiplies~λare close to the exact Lagrange multipliers~λ^∗ or if the penalty parameters are large. The second theorem relates the change of the multipliers by the multiplier update in Eq. (9.49) to the size of the penalty parameters. Improvement is guaranteed as long as the penalty parameters are sufficiently large.

We have implemented the augmented Lagrangian scheme in a general way with well-defined interfaces so that it can work with arbitrary parametrizations of the many-particle wave function and arbitrary constraints. The unconstrained subproblems can be solved with any suitable unconstrained minimization algorithm. In situations, where the deriva-tives of the augmented Lagrangian with respect to the variational parameters are avail-able, we solve the unconstrained subproblems either with the non-linear conjugate gra-dient method or the limited-memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton

method [Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970]. For parametriza-tions of many-particle wave funcparametriza-tions that contain complex parameters, we employ the complex generalizations of the minimization algorithms [Sorber et al., 2012] to pre-serve the compact structure. The line searches are solved either numerically exact, if L(~x+α~y,{λ_i},{µ_i}) can be easily written as a polynomial function in α, or otherwise with a secant line search. In cases where the derivatives are not available or if there is noise in the augmented Lagrangian, we use the simultaneous perturbation stochastic ap-proximation approach (SPSA, [Spall, 1987, 1992]). The SPSA will be discussed in detail in section 9.7.5.

The augmented Lagrangian approach presented here is a general algorithm for the solution of constrained minimization problems. The main benefits are a rather simple formulation, the availability of derivatives and, most importantly, the numerical stability¹.

Im Dokument New methods for the ab-initio simulation of correlated systems (Seite 163-168)