Global Convergence to Lower Level Sets - 5 Alternating Projections and Sparsity

5 Alternating Projections and Sparsity

5.2 Global Convergence to Lower Level Sets

We clarify some notation on linear mappings that will be needed throughout this thesis.

Definition 5.2.1. Let M ∈_R^m^×ⁿbe a linear mapping. From now on, assume that 1) _m≤n and

2) M is of full rank.

(5.1)

The nullspace of M is denoted bykerM, and M^†indicates theMoore-Penrose inverse(Moore, 1920), (Penrose, 1955) of M, which, because of the full rank assumption, becomes

M^†= M^>

MM^>−1

. (5.2)

The following definition is a generalization of the known concept of restricted iso-metry (Cand`es and Tao, 2005). Essentially, in the proofs including restricted isoiso-metry,

5.2 Global Convergence to Lower Level Sets just the properties of lower semicontinuity, subadditivity, and the fact that ϕ(x) = ϕ(−x)for allx∈ _Rⁿare used.

Definition 5.2.2. Let ϕ : Rⁿ → _Rbe a lower semicontinuous (Definition 2.1.7) and subad-ditive function (Definition 2.1.6) satisfyinginfx∈_Rⁿϕ(x)_, −_∞. Further, let M: Rⁿ → _R^m be a linear mapping of full rank. The mapping M is said to satisfy the ϕ-RIPof order s if there exists0≤δ≤1such that The infimumδ_sof all suchδis theisometry constant forϕ.

An important proporty ofM^†Mis that it is the orthogonal projection matrix onto the linear subspace ker(_M)^⊥. This means that the operator norm of M^†M is 1, and so we have, for allx ∈_Rⁿ, that

kM^†Mxk₂≤ kxk₂. (5.4)

This gives us the upper bound in (5.3) for free.

By Definition 2.1.7 based on (Rockafellar and Wets, 1998, Theorem 1.6), the lower semicontinuity of a function ϕis equivalent to the set lev≤sϕbeing a closed subset of Rⁿfor anys.

Now we define a minimization problem in a more general framework that can be specialized at two instances, namely, in Corollary 5.2.4 and Theorem 10.4.4.

minimize

x∈lev≤sϕ

f(x)_B ¹

2d_B(x)². (5.5)

We can now prove one of the main results in this thesis.

Theorem 5.2.3(global convergence of alternating projections to lower level sets). Let ϕ : Rⁿ → _Rbe a lower semicontinuous and subadditive function satisfyinginf_x∈_Rⁿ ϕ(x) _,

−∞. Further, let ϕ(x) =ϕ(−x)for all x∈_Rⁿ. For a fixed s>0, let the matrix M^†M satisfy (5.3) of order 2s with δ_2s ∈ [0,¹₂)for M in the definition of the affine set B given by (3.6).

Then B∩lev≤sϕis a singleton. Further, for any initial value x⁰∈_Rⁿ, the sequence{x^k}_k_∈_N generated by alternating projections (4.2.1) converges to B∩lev≤sϕ with d_B(x^k) → 0 as k →_∞at a linear rate with constant bounded above by q

δ2s

To establish convergence, for the iteratex^k, define the mapping q(x,x^k)_B f(x^k) +^Dx−x^k,M^†(Mx^k−p)^E+ ¹

5 Alternating Projections and Sparsity

where f is the objective function defined in (5.5). By definition of the projector, the iteratex^k⁺¹is a solution to the problem min{q(x,x^k)|x ∈ lev≤sϕ}. To see this, recall

Now, by definition of the alternating projections sequence, x^k⁺¹∈ P_lev_≤_s_ϕP_B(x^k) = P_lev_≤_s_ϕ

where the inequality in the middle follows from the fact that M^†M is an orthogonal

5.2 Global Convergence to Lower Level Sets

δ2s < 1. Inequalities (5.10)-(5.12) then imply thatd_B(x^k)→_{0 as}k→_∞at a linear rate for 0≤δ_2s< ¹₂, with constant bounded

Next, we consider an application of Theorem 5.2.3. Namely, we apply it to Problem (3.8). Specifically, we assume that Corollary 5.2.4(global convergence of alternating projections in sparse affine feasibility (Hesse et al., 2014, Theorem III.15)). For a fixed s > 0, let the matrix M^†M satisfy (5.13) with δ2s ∈ [0,¹₂) for M in the definition of the affine set B given by (3.6) . Then B∩ As

is a singleton. Further, for any initial value x⁰ ∈ _Rⁿ, the sequence {x^k}_k_∈_N generated by alternating projections (Definition 4.2.1) converges to B∩A_swith d_B(x^k) →0as k→ _∞at a linear rate with constant bounded by

δ2s

1−δ2s.

Proof. We note that the`₀-function is, by Proposition 3.2.7, a lower semicontinuous and subadditive function. The lower level set of`₀ is exactly equal toA_s. This means that property (5.13) is a specific instance of (5.3). Hence, the result follows from Theorem

5.2.3.

5 Alternating Projections and Sparsity

Corollary 5.2.4 shows that the values of the distance functiongconverge to zero at a linear rate. Together with the results established in Theorem 5.1.1, this actually shows linear convergence of the iterates to the intersection.

Theorem 5.2.5(global linear convergence of alternating projections). Under the assump-tions of Corollary 5.2.4, for any initial point x⁰there exists a positive constantρ<1such that, forx¯ ∈B∩As, the relation

Proof. By Corollary 5.2.4, we know that the squared distance of the iterates to the affine subspaceBdecreases at a linear rate to zero. This means that, for allε > 0 and for all x⁰ ∈_Rⁿ, there exists ˜k ∈_Nsuch that

x^k−P_Bx^k

≤ εfor allk ≥k. If we write^˜ A_sas a finite union ofs-dimensional subspaces (see Equation (3.12)), i.e.,

As= ^[

J∈J

AJ, we can define the set

DB{d(B,A_J)}_J_∈J ,

where d(B,A_J)is the distance betweenBandA_J (Definition 2.3.5). By Corollary 5.2.4, the intersectionA_s∩Bis equal to {x¯}. This means that, for someJ ∈ J, the distance

For an arbitrary initial pointx⁰given, choose ˜k ∈ _Nsuch that the iteratex^k^˜ satisfies

, we conclude that all iterates are elements of linear subspaces A_J with ¯x ∈ A_J. In other words, at this instance, we have an alternating projections sequence between affine subspaces with a unique intersection ¯x. The linear convergence of the sequence{x^k}_k_≥_˜_k to ¯xfollows from (Bauschke and Borwein, 1993).

After giving restrictions on the matrix M^†M, we give a result using restricted iso-metry of the matrix M with respect to lower level sets itself. Following (Beck and Teboulle, 2011), where the authors consider the problem

minimize ¹₂kMx−pk²₂subject tox ∈lev≤sϕ, (5.15)

5.2 Global Convergence to Lower Level Sets we present a sufficient condition for global linear convergence of the alternating projec-tions algorithm for affine sparse feasibility. Though our presentation is modeled after (Beck and Teboulle, 2011), this work is predated by the nearly identical approach devel-oped in (Blumensath and Davies, 2009) and (Blumensath and Davies, 2010) forϕ=`₀. We also note that in light of Theorem 5.2.3, as well as in (Beck and Teboulle, 2011), the arguments presented do not use any structure that is particular to Rⁿ. Hence, the re-sults can be extended to the problem of finding the intersection of the set of matrices with rank at mostsand an affine subspace in the Euclidean space of matrices.

Key to the analysis of (Blumensath and Davies, 2009), (Blumensath and Davies, 2010), and (Beck and Teboulle, 2011) are the following well-known restrictions on the matrix M.

Definition 5.2.6. The mapping M :Rⁿ →_R^msatisfies therestricted isometry propertyof order s if there exists0≤δ≤1such that

(1−δ)kxk²₂≤ kMxk²₂≤(1+δ)kxk²₂ ∀x∈lev≤sϕ. (5.16) The infimumδ_sof all suchδis therestricted isometry constant.

The mapping M:Rⁿ→_R^msatisfies thescaled/asymmetric restricted isometry property (SRIP)of order(s,α)forα>1if there existν_s,µ_s>0with1≤ ^µ^s

νs <αsuch that

ν_skxk²₂ ≤ kMxk²₂ ≤µ_skxk²₂ ∀x∈_lev_≤_s_ϕ. _(5.17) The restricted isometry property (5.16) was introduced in (Cand`es and Tao, 2005) for the function ϕ=`₀, while the asymmetric version (5.17) first appeared in (Blumensath and Davies, 2009, Theorem 4). Clearly, (5.16) implies (5.17) since if a matrix Msatisfies (5.16) of orderswith restricted isometry constantδs, then it also satisfies (5.17) of order (s,β)forβ> ¹₁⁺₋^δ^s

δs.

To motivate the projected gradient algorithm given below, note that any solution to (3.8) is also a solution to

find ¯x ∈SBargmin

x∈lev≤sϕ

2kMx−pk²₂. (5.18)

Conversely, if lev≤sϕ∩B,∅and ¯x∈S, then ¯xsolves (3.8).

The condition (5.13) can be reformulated in terms of the scaled/asymmetric restricted isometry property (5.17), strong regularity of the range of M^>, and the complement of each of the subspaces comprising A_2s. We remind the reader that

A_J Bspan{e_i|i∈ J} for J∈ J_2s_BⁿJ ∈2^{^1,2,...,n^}

Jhas 2selementso . Proposition 5.2.7 (SRIP and strong regularity (Hesse et al., 2014, Proposition III.14)).

Let M ∈ _R^m^×ⁿwith m ≤ n be of full rank. Then M satisfies(5.13)withδ_2s ∈ [0,^α⁻_α¹)for some fixed s >₀_{and fixed}α>₁if and only if M^†M satisfies the scaled/asymmetric restricted isometry property (5.17)of order (2s,α) with µ_2s = 1 and ν_2s = (1−δ_2s). Moreover, for

5 Alternating Projections and Sparsity

M satisfying(5.13)withδ_2s ∈ [0,^α⁻_α¹)for some fixed s > 0 and fixedα > 1, the collection (A^⊥_J, range(M^>))is strongly regular (Definition 2.4.7) for all J ∈ J_2s, that is,

(∀J ∈ J_2s) _A_J∩ker(_M) ={0}. (5.19) Proof. The first statement follows directly from the definition of the scaled/asymmetric restricted isometry property.

For the second statement, note that if M satisfies the inequality (5.13) with δ2s ∈ [0,^α⁻_α¹)for some fixeds > 0 and fixedα > 1, then the only element in A_2s satisfying M^†Mx = 0 isx =0. Recall that M^†Mis the projector onto the space orthogonal to the nullspace ofM, that is, the projector onto the range ofM^>. Thus,

A_2s∩[range(M^>)]^⊥ ={0}. (5.20) Here, we have used the fact that the projection of a pointxonto a subspaceΩis zero if and only ifx∈ _Ω^⊥. Now using the representation forA_2sgiven by (3.12), we have that (5.20) is equivalent to

AJ∩ker(_M^>) ={0} for allJ ∈ J_2s. (5.21) But, by (2.34), this is equivalent to the strong regularity of (A^⊥_J , range(M^>)) for all

J ∈ J_2s.

Next, the projected gradient algorithm is defined. The goal is to reformulate the method of alternating projections in terms of the projected gradient to apply known results on the projected gradient to AP.

Definition 5.2.8(projected gradients). Given a closed set A⊂_Rⁿ, a continuously differen-tiable function f :Rⁿ→R, and a positive real numberτ, the mapping

T_PG(x;τ) =P_A

x− ¹ τ

∇f(x)

(5.22) is called the projected gradient operator. The projected gradients algorithm is the fixed point iteration

x^k⁺¹∈ T_PG(x^k;τ_k) =P_A

x^k− ¹

τ_k∇f(x^k)

, k∈_N, for x⁰given arbitrarily and a sequence of positive real numbers(τ_k)_k_∈_N.

In the context of linear least squares with a sparsity constraint, the projected gradient algorithm is equivalent to what is also known as the iterative hard thresholding algo-rithm (see, for instance, (Blumensath and Davies, 2009), (Blumensath and Davies, 2010), and (Kyrillidis and Cevher, 2014)), where the constraintA= _A_sand the projector given by (3.16) amount to a thresholding operation on the largest elements of the iterate.

With these definitions, we cite a result on convergence of the projected gradient al-gorithm applied to (5.18).

5.2 Global Convergence to Lower Level Sets Theorem 5.2.9(global convergence of projected gradients/iterative hard thresholding (Blumensath and Davies, 2010, Theorem 4) and (Beck and Teboulle, 2011, Theorem 3 and Corollary 1)). Let ϕ : Rⁿ → _R be a lower semicontinuous and subadditive function satisfyinginf_x∈_Rⁿϕ(x) _, −∞. Further, let ϕ(x) = ϕ(−x)for all x ∈ _Rⁿ. Let M satisfy (5.17) of order (2s, 2). Further, for any given initial point x⁰, let the sequence {x^k}_k_∈_N be generated by the projected gradient algorithm with A = lev≤sϕ,f(x) = ¹₂kMx−pk²₂, and the constant step sizeτ ∈ [µ2s, 2ν2s). Then the iterates converge to the unique global solution of (5.18), and f(x^k)→0linearly as k →_∞with rateρ= ^τ

ν2s −1

<1, that is,

f(_x^k⁺¹)≤ρf(_x^k) _{for all k}∈_N. (5.23) Next, we specialize this theorem to alternating projections.

Corollary 5.2.10 (alternating projections in terms of projected gradients (Hesse et al., 2014, Corollary III.13)). Letϕ:Rⁿ→_Rbe a lower semicontinuous and subadditive function satisfyinginf_x∈_Rⁿϕ(x)_, −∞. Further, letϕ(x) = ϕ(−x)for all x∈ _Rⁿ. Let the matrix M satisfy(5.17)of order(2s, 2)withµ_2s = 1 and MM^> = Id. Thenlev≤sϕ∩B = {x¯}, i.e., the intersection is a singleton, and for any initial point x⁰, the alternating projections sequence {x^k}_k_∈_Ngenerated by(4.19)applied to(3.8)converges tolev≤sϕ∩B. The values of f(x^k) =

2kMx^k−pk²₂converge to zero with linear rateρ =_ν¹

2s −1

<1.

Proof. For f(x) = ¹₂kMx−pk²₂, we have∇f(x) = M^>(Mx−p). The projected gradi-ents iteration with constant step lengthτ=1 then takes the form

x^k⁺¹ ∈P_lev_≤_s_ϕ

x^k− ∇f(x^k)= P_lev_≤_s_ϕ

x^k−M^>(Mx^k−p). (5.24) The projection onto the subspaceBis given by (see (3.16))

P_Bx=Id−M^>(MM^>)⁻¹M

x+M^>(MM^>)⁻¹p. (5.25) SinceMM^>=Id, this simplifies tox^k−M^>(Mx^k−p) =P_Bx^k. Hence,

x^k⁺¹∈ P_lev_≤_s_ϕ

x^k− ∇f(x^k)=P_lev_≤_s_ϕPBx^k. (5.26) This shows that projected gradients 5.2.8 with unit step length applied to (5.18) with A=lev≤sϕand f(x) = ¹₂kMx−pk²₂is equivalent to the method of alternating projec-tions 4.2.1 applied to (3.8).

To show convergence to a unique solution, we apply Theorem 5.2.9, for which we must show that the step length τ = 1 lies in the nonempty interval [µ_2s, 2ν_2s). By assumption, M satisfies (5.17) of order (2s, 2)withµ_2s = 1. Hence, ¹₂ < ν_2s ≤ 1, and τ=1 lies in the nonempty interval[1, 2ν2s). The assumptions of Theorem 5.2.9 are thus satisfied with τ = 1, whence global convergence to the unique solution of (5.18), and

consequently (3.8), immediately follows.

5 Alternating Projections and Sparsity

Remark 5.2.11. Similarly to Corollary 5.2.4, the result in Corollary 5.2.10 can be adapted to sparse affine feasibility since the `₀-function is lower semicontinuous, subadditive, and it satisfies`₀(x) =`₀(−x)for all x ∈_Rⁿ.

Im Dokument Projection Methods in Sparse and Low Rank Feasibility (Seite 58-66)