• Keine Ergebnisse gefunden

Global Convergence to Lower Level Sets

5 Alternating Projections and Sparsity

5.2 Global Convergence to Lower Level Sets

We clarify some notation on linear mappings that will be needed throughout this thesis.

Definition 5.2.1. Let M ∈Rm×nbe a linear mapping. From now on, assume that 1) m≤n and

2) M is of full rank.

(5.1)

The nullspace of M is denoted bykerM, and Mindicates theMoore-Penrose inverse(Moore, 1920), (Penrose, 1955) of M, which, because of the full rank assumption, becomes

M= M>

MM>1

. (5.2)

The following definition is a generalization of the known concept of restricted iso-metry (Cand`es and Tao, 2005). Essentially, in the proofs including restricted isoiso-metry,

46

5.2 Global Convergence to Lower Level Sets just the properties of lower semicontinuity, subadditivity, and the fact that ϕ(x) = ϕ(−x)for allx∈ Rnare used.

Definition 5.2.2. Let ϕ : RnRbe a lower semicontinuous (Definition 2.1.7) and subad-ditive function (Definition 2.1.6) satisfyinginfxRnϕ(x),. Further, let M: RnRm be a linear mapping of full rank. The mapping M is said to satisfy the ϕ-RIPof order s if there exists0≤δ≤1such that The infimumδsof all suchδis theisometry constant forϕ.

An important proporty ofMMis that it is the orthogonal projection matrix onto the linear subspace ker(M). This means that the operator norm of MM is 1, and so we have, for allx ∈Rn, that

kMMxk2≤ kxk2. (5.4)

This gives us the upper bound in (5.3) for free.

By Definition 2.1.7 based on (Rockafellar and Wets, 1998, Theorem 1.6), the lower semicontinuity of a function ϕis equivalent to the set levsϕbeing a closed subset of Rnfor anys.

Now we define a minimization problem in a more general framework that can be specialized at two instances, namely, in Corollary 5.2.4 and Theorem 10.4.4.

minimize

xlevsϕ

f(x)B 1

2dB(x)2. (5.5)

We can now prove one of the main results in this thesis.

Theorem 5.2.3(global convergence of alternating projections to lower level sets). Let ϕ : RnRbe a lower semicontinuous and subadditive function satisfyinginfxRn ϕ(x) ,

−∞. Further, let ϕ(x) =ϕ(−x)for all x∈Rn. For a fixed s>0, let the matrix MM satisfy (5.3) of order 2s with δ2s ∈ [0,12)for M in the definition of the affine set B given by (3.6).

Then B∩levsϕis a singleton. Further, for any initial value x0Rn, the sequence{xk}kN generated by alternating projections (4.2.1) converges to B∩levsϕ with dB(xk) → 0 as k →at a linear rate with constant bounded above by q

δ2s

To establish convergence, for the iteratexk, define the mapping q(x,xk)B f(xk) +Dx−xk,M(Mxk−p)E+ 1

5 Alternating Projections and Sparsity

where f is the objective function defined in (5.5). By definition of the projector, the iteratexk+1is a solution to the problem min{q(x,xk)|x ∈ levsϕ}. To see this, recall

Now, by definition of the alternating projections sequence, xk+1∈ PlevsϕPB(xk) = Plevsϕ

where the inequality in the middle follows from the fact that MM is an orthogonal

48

5.2 Global Convergence to Lower Level Sets

δ2s < 1. Inequalities (5.10)-(5.12) then imply thatdB(xk)→0 ask→at a linear rate for 0≤δ2s< 12, with constant bounded

Next, we consider an application of Theorem 5.2.3. Namely, we apply it to Problem (3.8). Specifically, we assume that Corollary 5.2.4(global convergence of alternating projections in sparse affine feasibility (Hesse et al., 2014, Theorem III.15)). For a fixed s > 0, let the matrix MM satisfy (5.13) with δ2s ∈ [0,12) for M in the definition of the affine set B given by (3.6) . Then B∩ As

is a singleton. Further, for any initial value x0Rn, the sequence {xk}kN generated by alternating projections (Definition 4.2.1) converges to B∩Aswith dB(xk) →0as k→ at a linear rate with constant bounded by

q

δ2s

1δ2s.

Proof. We note that the`0-function is, by Proposition 3.2.7, a lower semicontinuous and subadditive function. The lower level set of`0 is exactly equal toAs. This means that property (5.13) is a specific instance of (5.3). Hence, the result follows from Theorem

5.2.3.

5 Alternating Projections and Sparsity

Corollary 5.2.4 shows that the values of the distance functiongconverge to zero at a linear rate. Together with the results established in Theorem 5.1.1, this actually shows linear convergence of the iterates to the intersection.

Theorem 5.2.5(global linear convergence of alternating projections). Under the assump-tions of Corollary 5.2.4, for any initial point x0there exists a positive constantρ<1such that, forx¯ ∈B∩As, the relation

Proof. By Corollary 5.2.4, we know that the squared distance of the iterates to the affine subspaceBdecreases at a linear rate to zero. This means that, for allε > 0 and for all x0Rn, there exists ˜k ∈Nsuch that

xk−PBxk

εfor allk ≥k. If we write˜ Asas a finite union ofs-dimensional subspaces (see Equation (3.12)), i.e.,

As= [

J∈J

AJ, we can define the set

DB{d(B,AJ)}J∈J ,

where d(B,AJ)is the distance betweenBandAJ (Definition 2.3.5). By Corollary 5.2.4, the intersectionAs∩Bis equal to {x¯}. This means that, for someJ ∈ J, the distance

For an arbitrary initial pointx0given, choose ˜k ∈ Nsuch that the iteratexk˜ satisfies

, we conclude that all iterates are elements of linear subspaces AJ with ¯x ∈ AJ. In other words, at this instance, we have an alternating projections sequence between affine subspaces with a unique intersection ¯x. The linear convergence of the sequence{xk}k˜k to ¯xfollows from (Bauschke and Borwein, 1993).

After giving restrictions on the matrix MM, we give a result using restricted iso-metry of the matrix M with respect to lower level sets itself. Following (Beck and Teboulle, 2011), where the authors consider the problem

minimize 12kMx−pk22subject tox ∈levsϕ, (5.15)

50

5.2 Global Convergence to Lower Level Sets we present a sufficient condition for global linear convergence of the alternating projec-tions algorithm for affine sparse feasibility. Though our presentation is modeled after (Beck and Teboulle, 2011), this work is predated by the nearly identical approach devel-oped in (Blumensath and Davies, 2009) and (Blumensath and Davies, 2010) forϕ=`0. We also note that in light of Theorem 5.2.3, as well as in (Beck and Teboulle, 2011), the arguments presented do not use any structure that is particular to Rn. Hence, the re-sults can be extended to the problem of finding the intersection of the set of matrices with rank at mostsand an affine subspace in the Euclidean space of matrices.

Key to the analysis of (Blumensath and Davies, 2009), (Blumensath and Davies, 2010), and (Beck and Teboulle, 2011) are the following well-known restrictions on the matrix M.

Definition 5.2.6. The mapping M :RnRmsatisfies therestricted isometry propertyof order s if there exists0≤δ≤1such that

(1−δ)kxk22≤ kMxk22≤(1+δ)kxk22 ∀x∈levsϕ. (5.16) The infimumδsof all suchδis therestricted isometry constant.

The mapping M:RnRmsatisfies thescaled/asymmetric restricted isometry property (SRIP)of order(s,α)forα>1if there existνs,µs>0with1≤ µs

νs <αsuch that

νskxk22 ≤ kMxk22µskxk22 ∀x∈levsϕ. (5.17) The restricted isometry property (5.16) was introduced in (Cand`es and Tao, 2005) for the function ϕ=`0, while the asymmetric version (5.17) first appeared in (Blumensath and Davies, 2009, Theorem 4). Clearly, (5.16) implies (5.17) since if a matrix Msatisfies (5.16) of orderswith restricted isometry constantδs, then it also satisfies (5.17) of order (s,β)forβ> 11+δs

δs.

To motivate the projected gradient algorithm given below, note that any solution to (3.8) is also a solution to

find ¯x ∈SBargmin

xlevsϕ

1

2kMx−pk22. (5.18)

Conversely, if levsϕ∩B,∅and ¯x∈S, then ¯xsolves (3.8).

The condition (5.13) can be reformulated in terms of the scaled/asymmetric restricted isometry property (5.17), strong regularity of the range of M>, and the complement of each of the subspaces comprising A2s. We remind the reader that

AJ Bspan{ei|i∈ J} for J∈ J2sBnJ ∈2{1,2,...,n}

Jhas 2selementso . Proposition 5.2.7 (SRIP and strong regularity (Hesse et al., 2014, Proposition III.14)).

Let M ∈ Rm×nwith m ≤ n be of full rank. Then M satisfies(5.13)withδ2s ∈ [0,αα1)for some fixed s >0and fixedα>1if and only if MM satisfies the scaled/asymmetric restricted isometry property (5.17)of order (2s,α) with µ2s = 1 and ν2s = (1−δ2s). Moreover, for

5 Alternating Projections and Sparsity

M satisfying(5.13)withδ2s ∈ [0,αα1)for some fixed s > 0 and fixedα > 1, the collection (AJ, range(M>))is strongly regular (Definition 2.4.7) for all J ∈ J2s, that is,

(∀J ∈ J2s) AJ∩ker(M) ={0}. (5.19) Proof. The first statement follows directly from the definition of the scaled/asymmetric restricted isometry property.

For the second statement, note that if M satisfies the inequality (5.13) with δ2s ∈ [0,αα1)for some fixeds > 0 and fixedα > 1, then the only element in A2s satisfying MMx = 0 isx =0. Recall that MMis the projector onto the space orthogonal to the nullspace ofM, that is, the projector onto the range ofM>. Thus,

A2s∩[range(M>)] ={0}. (5.20) Here, we have used the fact that the projection of a pointxonto a subspaceΩis zero if and only ifx∈ . Now using the representation forA2sgiven by (3.12), we have that (5.20) is equivalent to

AJ∩ker(M>) ={0} for allJ ∈ J2s. (5.21) But, by (2.34), this is equivalent to the strong regularity of (AJ , range(M>)) for all

J ∈ J2s.

Next, the projected gradient algorithm is defined. The goal is to reformulate the method of alternating projections in terms of the projected gradient to apply known results on the projected gradient to AP.

Definition 5.2.8(projected gradients). Given a closed set A⊂Rn, a continuously differen-tiable function f :RnR, and a positive real numberτ, the mapping

TPG(x;τ) =PA

x− 1 τ

∇f(x)

(5.22) is called the projected gradient operator. The projected gradients algorithm is the fixed point iteration

xk+1∈ TPG(xk;τk) =PA

xk1

τk∇f(xk)

, k∈N, for x0given arbitrarily and a sequence of positive real numbers(τk)kN.

In the context of linear least squares with a sparsity constraint, the projected gradient algorithm is equivalent to what is also known as the iterative hard thresholding algo-rithm (see, for instance, (Blumensath and Davies, 2009), (Blumensath and Davies, 2010), and (Kyrillidis and Cevher, 2014)), where the constraintA= Asand the projector given by (3.16) amount to a thresholding operation on the largest elements of the iterate.

With these definitions, we cite a result on convergence of the projected gradient al-gorithm applied to (5.18).

52

5.2 Global Convergence to Lower Level Sets Theorem 5.2.9(global convergence of projected gradients/iterative hard thresholding (Blumensath and Davies, 2010, Theorem 4) and (Beck and Teboulle, 2011, Theorem 3 and Corollary 1)). Let ϕ : RnR be a lower semicontinuous and subadditive function satisfyinginfxRnϕ(x) , −∞. Further, let ϕ(x) = ϕ(−x)for all x ∈ Rn. Let M satisfy (5.17) of order (2s, 2). Further, for any given initial point x0, let the sequence {xk}kN be generated by the projected gradient algorithm with A = levsϕ,f(x) = 12kMx−pk22, and the constant step sizeτ ∈ [µ2s, 2ν2s). Then the iterates converge to the unique global solution of (5.18), and f(xk)→0linearly as k →with rateρ= τ

ν2s −1

<1, that is,

f(xk+1)≤ρf(xk) for all kN. (5.23) Next, we specialize this theorem to alternating projections.

Corollary 5.2.10 (alternating projections in terms of projected gradients (Hesse et al., 2014, Corollary III.13)). Letϕ:RnRbe a lower semicontinuous and subadditive function satisfyinginfxRnϕ(x), −∞. Further, letϕ(x) = ϕ(−x)for all x∈ Rn. Let the matrix M satisfy(5.17)of order(2s, 2)withµ2s = 1 and MM> = Id. Thenlevsϕ∩B = {x¯}, i.e., the intersection is a singleton, and for any initial point x0, the alternating projections sequence {xk}kNgenerated by(4.19)applied to(3.8)converges tolevsϕ∩B. The values of f(xk) =

1

2kMxk−pk22converge to zero with linear rateρ =ν1

2s −1

<1.

Proof. For f(x) = 12kMx−pk22, we have∇f(x) = M>(Mx−p). The projected gradi-ents iteration with constant step lengthτ=1 then takes the form

xk+1 ∈Plevsϕ

xk− ∇f(xk)= Plevsϕ

xk−M>(Mxk−p). (5.24) The projection onto the subspaceBis given by (see (3.16))

PBx=Id−M>(MM>)1M

x+M>(MM>)1p. (5.25) SinceMM>=Id, this simplifies toxk−M>(Mxk−p) =PBxk. Hence,

xk+1∈ Plevsϕ

xk− ∇f(xk)=PlevsϕPBxk. (5.26) This shows that projected gradients 5.2.8 with unit step length applied to (5.18) with A=levsϕand f(x) = 12kMx−pk22is equivalent to the method of alternating projec-tions 4.2.1 applied to (3.8).

To show convergence to a unique solution, we apply Theorem 5.2.9, for which we must show that the step length τ = 1 lies in the nonempty interval [µ2s, 2ν2s). By assumption, M satisfies (5.17) of order (2s, 2)withµ2s = 1. Hence, 12 < ν2s ≤ 1, and τ=1 lies in the nonempty interval[1, 2ν2s). The assumptions of Theorem 5.2.9 are thus satisfied with τ = 1, whence global convergence to the unique solution of (5.18), and

consequently (3.8), immediately follows.

5 Alternating Projections and Sparsity

Remark 5.2.11. Similarly to Corollary 5.2.4, the result in Corollary 5.2.10 can be adapted to sparse affine feasibility since the `0-function is lower semicontinuous, subadditive, and it satisfies`0(x) =`0(−x)for all x ∈Rn.