• Keine Ergebnisse gefunden

We now present the algorithm used to solve the constrained optimization problem.

The method — the primal-dual active set method — was introduced in [BIK99] and analyzed in [HIK03] by showing that it can be interpreted as a semi-smooth Newton method. This section follows the theoretical basis from that work, but will only describe the finite-dimensional case, circumventing some of the more intricate parts of the general setting.

The problem has the general form

argmin

u�≥0 ∥𝑇𝑥 − 𝑦∥2, (5.10)

where𝑇 now contains both the forward operator and the Gramian matrix from the regularization penalty. Note in particular the simple form of the constraint due to the choice of basis above. There exist Lagrange multipliers𝜆 ∈ ℝu� such that

𝑇𝑇𝑥 − 𝜆 = 𝑇𝑦

𝑥 ≥ 0, 𝜆 ≥ 0, 𝜆u�𝑥 = 0. (5.11a) Conditions (5.11a) can be rewritten as

𝜆 = max(0, 𝜆 − 𝑐𝑥)

for any𝑐 > 0. Here, all inequalities and themax-operation are interpreted component-wise. Thus (5.10) is equivalent to finding a zero of𝑆∶ ℝ2u� → ℝ2u�,

𝑆(𝑥, 𝜆) ∶= ( 𝑇𝑇𝑥 − 𝜆 − 𝑇𝑦

𝜆 − max(0, 𝜆 − 𝑐𝑥)) . (5.12)

A fast algorithm for finding zeros is Newton’s method, which unfortunately is only applicable to differentiable functions. 𝑆 is not differentiable in the usual way, but turns out to be differentiable under a modified notion of differentiability, for which the Newton method is still applicable.

5.3 The Semi-smooth Newton Method Definition 5.3. Let𝑈 ⊂ ℝu�be an open set and𝐹∶ 𝑈 → ℝu�. Then𝑓 ∶ 𝑈 → ℝu�×u� is called aNewton derivativeof𝐹if, for all𝑥 ∈ 𝑈,

limℎ→0∥𝐹(𝑥 + ℎ) − 𝐹(𝑥) − 𝑓 [𝑥 + ℎ]ℎ∥

∥ℎ∥ = 0.

In comparison to the definition of the Fréchet derivative,𝐹[𝑥]is replaced by𝑓 [𝑥 + ℎ]. In general, Newton derivatives are not unique.

We are particularly interested in a Newton derivative of the mapℝu� ∋ 𝑥 ↦ max(0, 𝑥). It is given by

𝑚[𝑥] ∶= diag(𝜃(𝑥1), … , 𝜃(𝑥u�)),

where𝜃(𝑥) = 0for𝑥 ≤ 0and𝜃(𝑥) = 1for𝑥 > 0is the Heaviside step function.1 Indeed, for𝑖 = 1, … , 𝑛, if𝑥u� ≠ 0and∣ℎu�∣ < ∣𝑥u�∣, then

∣max(0, 𝑥u�+ ℎu�) − max(0, 𝑥u�) − 𝜃(𝑥u�+ ℎu�)ℎu�∣ = 0.

On the other hand, if𝑥u�= 0,∣max(0, ℎu�) − 𝜃(ℎu�)ℎu�∣ = 0holds true for arbitraryℎu�. So

∥max(0, 𝑥 + ℎ) − max(0, ℎ) − 𝑚[𝑥 + ℎ]ℎ∥

∥ℎ∥ = 0

if∥ℎ∥ ̸=0is small enough.

TheSemi-smooth Newton Methodnow is simply Newton’s method with the derivatives replaced by a Newton derivative.

Theorem 5.4. Let 𝑈 ⊂ ℝu� be open, 𝐹∶ 𝑈 → ℝu� and 𝑥 ∈ 𝑈 be a zero of𝐹. If𝐹 has a Newton derivative𝑓 for which∥𝑓 [𝑥]−1is uniformly bounded for𝑥in a neighborhood of𝑥, then the Newton iteration

𝑥u�+1= 𝑥u�− 𝑓 [𝑥u�]−1𝐹(𝑥u�)

converges superlinearly to𝑥 provided∥𝑥0− 𝑥is sufficiently small.

Proof. See [HIK03, Theorem 1.1].

It is easy to see from the definition that if𝑓 is a Newton derivative for𝐹and𝐴∶ 𝑥 ↦ 𝐴𝑥 + 𝑎is an affine mapping, then 𝑥 ↦ 𝑓 [𝐴𝑥]𝐴 is Newton derivative for𝐹 ∘ 𝐴, and 𝑥 ↦ 𝐴𝑓 [𝑥]is a Newton derivative for𝐴 ∘ 𝐹. Hence, the function𝑆defined in (5.12) has a Newton derivative

𝑠[𝑥, 𝜆] = ( 𝑇𝑇 −𝟙

𝑐𝑚[𝜆 − 𝑐𝑥] 𝟙 − 𝑚[𝜆 − 𝑐𝑥]) .

1Actually, the value of𝜃(0)does not matter; any value yields a Newton derivative.

Introduce theactive set

u�∶= {𝑖∶ 𝜆u�− 𝑐𝑥u� > 0}, and its complementℐ∶=u�u�, theinactive set. Then

𝑃u� = 𝑚[𝜆 − 𝑐𝑥]

is the projection onto the active set, and𝑠can be written as 𝑠[𝑥, 𝜆] = (𝑇𝑇 −𝟙

𝑐𝑃u� 𝑃) . If

𝑠[𝑥, 𝜆] (𝛿𝛿u�u�) = (𝑣𝑤),

then 𝛿u�= 𝑐−1𝑃u�𝑤 + 𝑃(𝑃𝑇𝑇𝑃)𝑃(𝑣 + 𝑤 − 𝑐−1𝑇𝑇𝑃u�𝑤), 𝛿u�= 𝑇𝑇𝛿𝑥 − 𝑣,

as can be checked by direct calculation. Since𝑇𝑇 positive definite,∥(𝑃𝑇𝑇𝑃)∥is bounded independently of𝑥and𝜆, and thus so is∥𝑠[𝑥, 𝜆]−1∥. It follows that Theorem 5.4 is applicable and the following algorithm converges superlinearly if the initial guess is sufficiently close to the solution.

Algorithm 5.5(Semi-smooth Newton Method)

• Choose𝑥0, 𝜆0 ∈ ℝu�.

• For𝑘 = 0, 1, …do:

1. Letu�u� = {𝑖∶ (𝜆u�− 𝑐𝑥u�)u� > 0} ⊂ {1, … , 𝑛}andℐu� =u�u�u�. 2. Solve

𝑇𝑇𝑥u�+1− 𝜆u�+1 = 𝑇𝑦 𝑥u�+1|u�u� = 0 𝜆u�+1|u� = 0, for(𝑥u�+1, 𝜆u�+1), or equivalently

𝑃u�𝑇𝑇𝑃u�𝑥u�+1= 𝑃u�𝑇𝑦 𝑥u�+1 = 𝑃u�𝑥u�+1 𝜆u�+1 = 𝑇𝑇𝑥u�+1− 𝑇𝑦.

(5.13)

Here, the operator𝑇𝑇 only has to be inverted on the inactive setℐu�. 3. Check stopping criterion (see below).

5.3 The Semi-smooth Newton Method The auxiliary constant𝑐 > 0can in principle be chosen freely. Choosing𝑐 = 𝛼usually leads to good results. The operator𝑃u�𝑇𝑇𝑃u� in the first equation of the iteration step (5.13) is positive definite; therefore, the equation can be solved efficiently using the Conjugate Gradient (CG) method, which only requires applications of the forward operator.

5.3.1 Duality gap as a stopping rule

In Algorithm 5.5,𝑥u�+1and𝜆u�+1only depend on𝑥u�and𝜆u�through their dependence onu�u�. Since there are only finitely many possible subsets of {1, … , 𝑛}and since the algorithm converges, it actually converges in a finite number of steps. In principle, one could stop the iterations once the setsu�u� and u�u�+1 are identical and obtain an exact solution (up to errors in solving the linear equation in each iteration). However, the running time of the algorithm can often be greatly improved by stopping earlier with an approximate solution. One possible way to estimate the distance to the exact solution is to consider the difference between the Tikhonov functional and a dual functional.

Definition 5.6. Let 𝐶 ⊂ ℝu� be a non-empty, closed and convex cone,𝐹 ∈ ℝu�×u�,𝑦 ∈ ℝu�, 𝐿 ∈ ℝu�×u�injective and𝑧 ∈ ℝu�. Theprimal functionalis defined as

u�(𝑥) ∶= 12∥𝐹𝑥 − 𝑦∥2+ 12∥𝐿(𝑥 − 𝑎)∥2+ 𝜒u�(𝑥) (5.14) for𝑥 ∈ ℝu�, and thedual functionalis

u�(𝑝) ∶= −12∥𝑝∥2+ ⟨𝑝, 𝑦⟩ + 12∥𝑎∥2u� − 12∥𝑃u�u�((𝐿𝐿)−1𝐹𝑝 + 𝑎)∥2u�,

for𝑝 ∈ 𝑅u�, where∥⋅∥u� = ∥𝐿⋅∥and𝑃u�u� is the Euclidean projection onto𝐶in that norm. The problem

𝑥 = argmin

u�∈ℝu� u�(𝑥) (5.15)

is calledprimal problem, and

𝑝 = argmax

u�∈ℝu� u�(𝑝) (5.16)

is called the associateddual problem.

The relation between primal and dual problem is given by the following theorem.

Theorem 5.7. Problem(5.15)has a solution𝑥, problem(5.16)has a solution𝑝, and strong duality holds, i.e.u�(𝑥) =u�(𝑝). 𝑥and𝑝 are related by

𝑥 = 𝑃u�u�((𝐿𝐿)−1𝐹𝑝 + 𝑎),

𝑝 = 𝑦 − 𝐹𝑥. (5.17)

Proof. First, note that if we define the functionalℛby writing u�(𝑥) = 12∥𝐹𝑥 − 𝑦∥2+ℛ(𝑥), then

u�(𝑝) = −12∥𝑝∥2+ ⟨𝑝, 𝑦⟩ −ℛ(𝐹𝑝),

whereℛ is the conjugate ofℛas in Definition 3.10. Existence and uniqueness of the minimizer𝑥is shown in [EHN96, Theorem 5.15]. Let ̄𝑝 ∶= 𝑦 − 𝐹𝑥. Then0 ∈ 𝜕u�(𝑥)is equivalent to𝐹 ̄𝑝 ∈ 𝜕ℛ(𝑥). By Lemma 3.11, this implies𝑥 ∈ 𝜕ℛ(𝐹 ̄𝑝), from which

0 ∈ −𝐹𝑥+ 𝐹𝜕ℛ(𝐹 ̄𝑝) = 𝜕(−u�)( ̄𝑝)

follows. So ̄𝑝 = 𝑝 is indeed the (unique) maximizer ofu�. The first condition in (5.17) is the same as𝐹𝑝 ∈ 𝜕ℛ(𝑥). Finally,

u�(𝑥) −u�(𝑝) = −⟨𝑝, 𝐹𝑥⟩ +ℛ(𝑥) +ℛ(𝐹𝑝) = 0 using Lemma 3.11 again.

Definition 5.8. Theduality gapu�at(𝑥, 𝑝) ∈ ℝu� × ℝu� is defined as the difference between the primal and dual functional,

u�(𝑥, 𝑝) ∶=u�(𝑥) −u�(𝑝).

Moreover, we define

u�(𝑥) ∶=u�(𝑥, 𝑦 − 𝐹𝑥).

Written explicitly,

u�(𝑥) = 12∥𝑥 − 𝑟(𝑥)∥2u�− 12∥𝑟(𝑥) − 𝑃u�u�(𝑟(𝑥))∥2u�, 𝑟(𝑥) ∶= (𝐿𝐿)−1𝐹(𝑦 − 𝐹𝑥) + 𝑎.

Due to Theorem 5.7,u�is always non-negative, andu�(𝑥) = 0. In addition,u�(𝑥)can be used to estimate the distance of𝑥to the solution of (5.15) in terms of the norm‖⋅‖u�. Theorem 5.9. For all𝑥 ∈ 𝐶and𝑝 ∈ ℝu�.

∥𝐿(𝑥 − 𝑥)∥2 ≤ 2u�(𝑥, 𝑝) In particular,∥𝐿(𝑥 − 𝑥)∥2≤ 2u�(𝑥).

Proof. If 𝑥 ∉ 𝐶, the statement is obvious sinceu�(𝑥) = ∞. For𝑥 ∈ 𝐶, the optimality condition for the primal problem implies that

⟨𝐹(𝐹𝑥 − 𝑦) + 𝐿(𝐿𝑥 − 𝑎), 𝑥 − 𝑥⟩ ≥ 0.

5.3 The Semi-smooth Newton Method Thus

u�(𝑥) −u�(𝑥) = 12∥𝐹(𝑥 − 𝑥)∥2+ 12∥𝐿(𝑥 − 𝑥)∥2 + ⟨𝐹(𝐹𝑥 − 𝑦) + 𝐿(𝐿𝑥 − 𝑎), 𝑥 − 𝑥

≥ 12∥𝐿(𝑥 − 𝑥)∥2.

Sinceu�(𝑥) ≥u�(𝑝)for all𝑝 ∈ ℝu� by Theorem 5.7, the assertion follows.

This leads to the following stopping criterion for Algorithm 5.5:

3. Stop the iteration if u�u�+1 = u�u�, or if u�(𝑃u�𝑥u�+1) ≤ 𝜀 for some small 𝜀 > 0. Return𝑃u�𝑥u�+1.

𝑃u�is the projection onto the constraint set. We use𝑃u�𝑥u�+1 instead of𝑥u�+1, since the latter will in often violate the constraint, and thus the duality gap may be infinite at 𝑥u�+1. This would make the stopping criterion fail even in cases where𝑥u�+1is close to 𝑥.

To apply this stopping rule to problem (5.2), in principle we could take

𝐹 = √𝐺u�𝐵, 𝑦 = √𝐺u�𝑧u�, 𝐿 = √𝛼𝐺u�, 𝑎 = 0 (5.18) and𝐶 = {𝑥 ∈ ℝu�u�×u�u�∶ 𝑥 ≥ 0}. One would then have to evaluate the projection onto𝐶 in the𝐺u�-norm, the computation of which may itself not be straightforward.

To overcome this problem, we write𝐺u�as

𝛼𝐺u�= 𝛽𝟙 + 𝐷𝐷.

for some sufficiently small𝛽 > 0and𝐷 ∈ ℝu�×u�, which can always be achieved since 𝐺u�is positive definite. If we now put

𝐹 = (√𝐺u�𝐵

𝐷 ) , 𝑦 = (√𝐺u�𝑧u�

0 ) , 𝐿 =√𝛽, 𝑎 = 0, (5.19) then𝑃u�u�= 𝑃u�can be evaluated cheaply. The dual functional using (5.19) is

u�(𝑝) = −12∥𝑝∥2+ ⟨𝑝, 𝑦⟩ − 12𝛽∥𝑃u�(𝐹𝑝)∥2,

so for the evaluation ofu�(𝑥) =u�(𝑥) −u�(𝑦 − 𝐹𝑥), only𝐹𝑦 = 𝐵𝐺u�𝑧u� and 𝐹𝐹 = 𝐵𝐺u�𝐵 + 𝐷𝐷 = 𝐵𝐺u�𝐵 + 𝛼𝐺u�− 𝛽𝟙

are required, not the auxiliary operator𝐷. Note that (5.19), combined with Theorem 5.9 and the stopping rule, leads to

𝛽

2∥𝑥 − 𝑥2 ≤ 𝜀, whereas (5.18) would yield the stronger estimate

𝛼2∥√𝐺u�(𝑥 − 𝑥)∥2≤ 𝜀.

Since we are working in the finite dimensional setting, the norms are of course equiv-alent; numerically, they can still differ considerably. This has to be accounted for by choosing a smaller tolerance𝜀.