The Semi-smooth Newton Method - Spatial Coherence Enhancing Reconstructions for High Angular Re

We now present the algorithm used to solve the constrained optimization problem.

The method — the primal-dual active set method — was introduced in [BIK99] and analyzed in [HIK03] by showing that it can be interpreted as a semi-smooth Newton method. This section follows the theoretical basis from that work, but will only describe the ﬁnite-dimensional case, circumventing some of the more intricate parts of the general setting.

The problem has the general form

argmin

u�≥0 ∥𝑇𝑥 − 𝑦∥², (5.10)

where𝑇 now contains both the forward operator and the Gramian matrix from the regularization penalty. Note in particular the simple form of the constraint due to the choice of basis above. There exist Lagrange multipliers𝜆 ∈ ℝ^u� such that

𝑇^∗𝑇𝑥 − 𝜆 = 𝑇^∗𝑦

𝑥 ≥ 0, 𝜆 ≥ 0, 𝜆^u�𝑥 = 0. (5.11a) Conditions (5.11a) can be rewritten as

𝜆 = max(0, 𝜆 − 𝑐𝑥)

for any𝑐 > 0. Here, all inequalities and themax-operation are interpreted component-wise. Thus (5.10) is equivalent to ﬁnding a zero of𝑆∶ ℝ^2u� → ℝ^2u�,

𝑆(𝑥, 𝜆) ∶= ( 𝑇^∗𝑇𝑥 − 𝜆 − 𝑇^∗𝑦

𝜆 − max(0, 𝜆 − 𝑐𝑥)) . (5.12)

A fast algorithm for ﬁnding zeros is Newton’s method, which unfortunately is only applicable to diﬀerentiable functions. 𝑆 is not diﬀerentiable in the usual way, but turns out to be diﬀerentiable under a modiﬁed notion of diﬀerentiability, for which the Newton method is still applicable.

5.3 The Semi-smooth Newton Method Deﬁnition 5.3. Let𝑈 ⊂ ℝ^u�be an open set and𝐹∶ 𝑈 → ℝ^u�. Then𝑓 ∶ 𝑈 → ℝ^u�×u� is called aNewton derivativeof𝐹if, for all𝑥 ∈ 𝑈,

lim_ℎ→0∥𝐹(𝑥 + ℎ) − 𝐹(𝑥) − 𝑓 [𝑥 + ℎ]ℎ∥

∥ℎ∥ = 0.

In comparison to the deﬁnition of the Fréchet derivative,𝐹^′[𝑥]is replaced by𝑓 [𝑥 + ℎ]. In general, Newton derivatives are not unique.

We are particularly interested in a Newton derivative of the mapℝ^u� ∋ 𝑥 ↦ max(0, 𝑥). It is given by

𝑚[𝑥] ∶= diag(𝜃(𝑥₁), … , 𝜃(𝑥_u�)),

where𝜃(𝑥) = 0for𝑥 ≤ 0and𝜃(𝑥) = 1for𝑥 > 0is the Heaviside step function.¹ Indeed, for𝑖 = 1, … , 𝑛, if𝑥u� ≠ 0and∣ℎu�∣ < ∣𝑥u�∣, then

∣max(0, 𝑥_u�+ ℎ_u�) − max(0, 𝑥_u�) − 𝜃(𝑥_u�+ ℎ_u�)ℎ_u�∣ = 0.

On the other hand, if𝑥u�= 0,∣max(0, ℎu�) − 𝜃(ℎu�)ℎu�∣ = 0holds true for arbitraryℎu�. So

∥max(0, 𝑥 + ℎ) − max(0, ℎ) − 𝑚[𝑥 + ℎ]ℎ∥

∥ℎ∥ = 0

if∥ℎ∥ ̸=0is small enough.

TheSemi-smooth Newton Methodnow is simply Newton’s method with the derivatives replaced by a Newton derivative.

Theorem 5.4. Let 𝑈 ⊂ ℝ^u� be open, 𝐹∶ 𝑈 → ℝ^u� and 𝑥^∗ ∈ 𝑈 be a zero of𝐹. If𝐹 has a Newton derivative𝑓 for which∥𝑓 [𝑥]⁻¹∥is uniformly bounded for𝑥in a neighborhood of𝑥^∗, then the Newton iteration

𝑥_u�+1= 𝑥_u�− 𝑓 [𝑥_u�]⁻¹𝐹(𝑥_u�)

converges superlinearly to𝑥^∗ provided∥𝑥0− 𝑥^∗∥is suﬃciently small.

Proof. See [HIK03, Theorem 1.1].

It is easy to see from the deﬁnition that if𝑓 is a Newton derivative for𝐹and𝐴∶ 𝑥 ↦ 𝐴^′𝑥 + 𝑎is an aﬃne mapping, then 𝑥 ↦ 𝑓 [𝐴𝑥]𝐴^′ is Newton derivative for𝐹 ∘ 𝐴, and 𝑥 ↦ 𝐴^′𝑓 [𝑥]is a Newton derivative for𝐴 ∘ 𝐹. Hence, the function𝑆deﬁned in (5.12) has a Newton derivative

𝑠[𝑥, 𝜆] = ( 𝑇^∗𝑇 −𝟙

𝑐𝑚[𝜆 − 𝑐𝑥] 𝟙 − 𝑚[𝜆 − 𝑐𝑥]) .

1Actually, the value of𝜃(0)does not matter; any value yields a Newton derivative.

Introduce theactive set

u�∶= {𝑖∶ 𝜆_u�− 𝑐𝑥_u� > 0}, and its complementℐ∶=u�^u�, theinactive set. Then

𝑃_u� = 𝑚[𝜆 − 𝑐𝑥]

is the projection onto the active set, and𝑠can be written as 𝑠[𝑥, 𝜆] = (𝑇^∗𝑇 −𝟙

𝑐𝑃_u� 𝑃_ℐ) . If

𝑠[𝑥, 𝜆] (𝛿𝛿^u�_u�) = (𝑣𝑤),

then 𝛿_u�= 𝑐⁻¹𝑃_u�𝑤 + 𝑃_ℐ(𝑃_ℐ𝑇^∗𝑇𝑃_ℐ)^†𝑃_ℐ(𝑣 + 𝑤 − 𝑐⁻¹𝑇^∗𝑇𝑃_u�𝑤), 𝛿_u�= 𝑇^∗𝑇𝛿𝑥 − 𝑣,

as can be checked by direct calculation. Since𝑇^∗𝑇 positive deﬁnite,∥(𝑃_ℐ𝑇^∗𝑇𝑃_ℐ)^†∥is bounded independently of𝑥and𝜆, and thus so is∥𝑠[𝑥, 𝜆]⁻¹∥. It follows that Theorem 5.4 is applicable and the following algorithm converges superlinearly if the initial guess is suﬃciently close to the solution.

Algorithm 5.5(Semi-smooth Newton Method)

• Choose𝑥₀, 𝜆₀ ∈ ℝ^u�.

• For𝑘 = 0, 1, …do:

1. Letu�_u� = {𝑖∶ (𝜆_u�− 𝑐𝑥_u�)_u� > 0} ⊂ {1, … , 𝑛}andℐ_u� =u�^u�_u�. 2. Solve

𝑇^∗𝑇𝑥_u�+1− 𝜆_u�+1 = 𝑇^∗𝑦 𝑥u�+1|u�_u� = 0 𝜆_u�+1|_ℐ_u� = 0, for(𝑥_u�+1, 𝜆_u�+1), or equivalently

𝑃ℐ_u�𝑇^∗𝑇𝑃ℐ_u�𝑥u�+1= 𝑃ℐ_u�𝑇^∗𝑦 𝑥_u�+1 = 𝑃_ℐ_u�𝑥_u�+1 𝜆_u�+1 = 𝑇^∗𝑇𝑥_u�+1− 𝑇^∗𝑦.

(5.13)

Here, the operator𝑇^∗𝑇 only has to be inverted on the inactive setℐ_u�. 3. Check stopping criterion (see below).

5.3 The Semi-smooth Newton Method The auxiliary constant𝑐 > 0can in principle be chosen freely. Choosing𝑐 = 𝛼usually leads to good results. The operator𝑃_ℐ_u�𝑇^∗𝑇𝑃_ℐ_u� in the ﬁrst equation of the iteration step (5.13) is positive deﬁnite; therefore, the equation can be solved eﬃciently using the Conjugate Gradient (CG) method, which only requires applications of the forward operator.

5.3.1 Duality gap as a stopping rule

In Algorithm 5.5,𝑥_u�+1and𝜆_u�+1only depend on𝑥_u�and𝜆_u�through their dependence onu�_u�. Since there are only ﬁnitely many possible subsets of {1, … , 𝑛}and since the algorithm converges, it actually converges in a ﬁnite number of steps. In principle, one could stop the iterations once the setsu�_u� and u�_u�+1 are identical and obtain an exact solution (up to errors in solving the linear equation in each iteration). However, the running time of the algorithm can often be greatly improved by stopping earlier with an approximate solution. One possible way to estimate the distance to the exact solution is to consider the diﬀerence between the Tikhonov functional and a dual functional.

Deﬁnition 5.6. Let 𝐶 ⊂ ℝ^u� be a non-empty, closed and convex cone,𝐹 ∈ ℝ^u�×u�,𝑦 ∈ ℝ^u�, 𝐿 ∈ ℝ^u�×u�injective and𝑧 ∈ ℝ^u�. Theprimal functionalis deﬁned as

u�(𝑥) ∶= 12∥𝐹𝑥 − 𝑦∥²+ 12∥𝐿(𝑥 − 𝑎)∥²+ 𝜒_u�(𝑥) (5.14) for𝑥 ∈ ℝ^u�, and thedual functionalis

u�_∗(𝑝) ∶= −12∥𝑝∥²+ ⟨𝑝, 𝑦⟩ + 12∥𝑎∥²^u� − 12∥𝑃^u�^u�((𝐿^∗𝐿)⁻¹𝐹^∗𝑝 + 𝑎)∥²_u�,

for𝑝 ∈ 𝑅^u�, where∥⋅∥_u� = ∥𝐿⋅∥and𝑃^u�_u� is the Euclidean projection onto𝐶in that norm. The problem

𝑥^∗ = argmin

u�∈ℝ^u� u�(𝑥) (5.15)

is calledprimal problem, and

𝑝^∗ = argmax

u�∈ℝ^u� u�_∗(𝑝) (5.16)

is called the associateddual problem.

The relation between primal and dual problem is given by the following theorem.

Theorem 5.7. Problem(5.15)has a solution𝑥^∗, problem(5.16)has a solution𝑝^∗, and strong duality holds, i.e.u�(𝑥^∗) =u�_∗(𝑝^∗). 𝑥^∗and𝑝^∗ are related by

𝑥^∗ = 𝑃^u�_u�((𝐿^∗𝐿)⁻¹𝐹^∗𝑝^∗ + 𝑎),

𝑝^∗ = 𝑦 − 𝐹𝑥^∗. (5.17)

Proof. First, note that if we deﬁne the functionalℛby writing u�(𝑥) = 12∥𝐹𝑥 − 𝑦∥²+ℛ(𝑥), then

u�_∗(𝑝) = −12∥𝑝∥²+ ⟨𝑝, 𝑦⟩ −ℛ^∗(𝐹^∗𝑝),

whereℛ^∗ is the conjugate ofℛas in Deﬁnition 3.10. Existence and uniqueness of the minimizer𝑥^∗is shown in [EHN96, Theorem 5.15]. Let ̄𝑝 ∶= 𝑦 − 𝐹𝑥^∗. Then0 ∈ 𝜕u�(𝑥^∗)is equivalent to𝐹^∗ ̄𝑝 ∈ 𝜕ℛ(𝑥^∗). By Lemma 3.11, this implies𝑥^∗ ∈ 𝜕ℛ^∗(𝐹^∗ ̄𝑝), from which

0 ∈ −𝐹𝑥^∗+ 𝐹𝜕ℛ^∗(𝐹^∗ ̄𝑝) = 𝜕(−u�∗)( ̄𝑝)

follows. So ̄𝑝 = 𝑝^∗ is indeed the (unique) maximizer ofu�_∗. The ﬁrst condition in (5.17) is the same as𝐹^∗𝑝^∗ ∈ 𝜕ℛ(𝑥^∗). Finally,

u�(𝑥^∗) −u�_∗(𝑝^∗) = −⟨𝑝^∗, 𝐹𝑥^∗⟩ +ℛ(𝑥^∗) +ℛ^∗(𝐹^∗𝑝^∗) = 0 using Lemma 3.11 again.

Deﬁnition 5.8. Theduality gapu�at(𝑥, 𝑝) ∈ ℝ^u� × ℝ^u� is deﬁned as the diﬀerence between the primal and dual functional,

u�(𝑥, 𝑝) ∶=u�(𝑥) −u�_∗(𝑝).

Moreover, we deﬁne

u�(𝑥) ∶=u�(𝑥, 𝑦 − 𝐹𝑥).

Written explicitly,

u�(𝑥) = 12∥𝑥 − 𝑟(𝑥)∥²^u�− 12∥𝑟(𝑥) − 𝑃^u�^u�(𝑟(𝑥))∥²_u�, 𝑟(𝑥) ∶= (𝐿^∗𝐿)⁻¹𝐹^∗(𝑦 − 𝐹𝑥) + 𝑎.

Due to Theorem 5.7,u�is always non-negative, andu�(𝑥^∗) = 0. In addition,u�(𝑥)can be used to estimate the distance of𝑥to the solution of (5.15) in terms of the norm‖⋅‖_u�. Theorem 5.9. For all𝑥 ∈ 𝐶and𝑝 ∈ ℝ^u�.

∥𝐿(𝑥 − 𝑥^∗)∥² ≤ 2u�(𝑥, 𝑝) In particular,∥𝐿(𝑥 − 𝑥^∗)∥²≤ 2u�(𝑥).

Proof. If 𝑥 ∉ 𝐶, the statement is obvious sinceu�(𝑥) = ∞. For𝑥 ∈ 𝐶, the optimality condition for the primal problem implies that

⟨𝐹^∗(𝐹𝑥 − 𝑦) + 𝐿^∗(𝐿𝑥 − 𝑎), 𝑥 − 𝑥^∗⟩ ≥ 0.

5.3 The Semi-smooth Newton Method Thus

u�(𝑥) −u�(𝑥^∗) = 12∥𝐹(𝑥 − 𝑥^∗)∥²+ 12∥𝐿(𝑥 − 𝑥^∗)∥² + ⟨𝐹^∗(𝐹𝑥 − 𝑦) + 𝐿^∗(𝐿𝑥 − 𝑎), 𝑥 − 𝑥^∗⟩

≥ 12∥𝐿(𝑥 − 𝑥^∗)∥².

Sinceu�(𝑥^∗) ≥u�_∗(𝑝)for all𝑝 ∈ ℝ^u� by Theorem 5.7, the assertion follows.

This leads to the following stopping criterion for Algorithm 5.5:

3. Stop the iteration if u�_u�+1 = u�_u�, or if u�(𝑃_u�𝑥_u�+1) ≤ 𝜀 for some small 𝜀 > 0. Return𝑃_u�𝑥_u�+1.

𝑃_u�is the projection onto the constraint set. We use𝑃_u�𝑥_u�+1 instead of𝑥_u�+1, since the latter will in often violate the constraint, and thus the duality gap may be inﬁnite at 𝑥u�+1. This would make the stopping criterion fail even in cases where𝑥u�+1is close to 𝑥^∗.

To apply this stopping rule to problem (5.2), in principle we could take

𝐹 = √𝐺^u�𝐵, 𝑦 = √𝐺^u�𝑧^u�, 𝐿 = √𝛼𝐺^u�, 𝑎 = 0 (5.18) and𝐶 = {𝑥 ∈ ℝ^u�^u�^×u�^u�∶ 𝑥 ≥ 0}. One would then have to evaluate the projection onto𝐶 in the𝐺^u�-norm, the computation of which may itself not be straightforward.

To overcome this problem, we write𝐺^u�as

𝛼𝐺^u�= 𝛽𝟙 + 𝐷^∗𝐷.

for some suﬃciently small𝛽 > 0and𝐷 ∈ ℝ^u�×u�, which can always be achieved since 𝐺^u�is positive deﬁnite. If we now put

𝐹 = (√𝐺^u�𝐵

𝐷 ) , 𝑦 = (√𝐺^u�𝑧^u�

0 ) , 𝐿 =√𝛽, 𝑎 = 0, (5.19) then𝑃^u�_u�= 𝑃_u�can be evaluated cheaply. The dual functional using (5.19) is

u�_∗(𝑝) = −12∥𝑝∥²+ ⟨𝑝, 𝑦⟩ − 12𝛽∥𝑃^u�(𝐹^∗𝑝)∥²,

so for the evaluation ofu�(𝑥) =u�(𝑥) −u�_∗(𝑦 − 𝐹𝑥), only𝐹^∗𝑦 = 𝐵^∗𝐺^u�𝑧^u� and 𝐹^∗𝐹 = 𝐵^∗𝐺^u�𝐵 + 𝐷^∗𝐷 = 𝐵^∗𝐺^u�𝐵 + 𝛼𝐺^u�− 𝛽𝟙

are required, not the auxiliary operator𝐷. Note that (5.19), combined with Theorem 5.9 and the stopping rule, leads to

𝛽

2∥𝑥 − 𝑥^∗∥² ≤ 𝜀, whereas (5.18) would yield the stronger estimate

𝛼2∥√𝐺^u�(𝑥 − 𝑥^∗)∥²≤ 𝜀.

Since we are working in the ﬁnite dimensional setting, the norms are of course equiv-alent; numerically, they can still diﬀer considerably. This has to be accounted for by choosing a smaller tolerance𝜀.

Im Dokument Spatial Coherence Enhancing Reconstructions for High Angular Resolution Diffusion MRI (Seite 62-68)