• Keine Ergebnisse gefunden

6. Robust Structured Low-Rank Approximation on the Grassmannian 101

6.2. Alternating minimization framework

As discussed in Chapter 4, the factorization approach for Low-Rank Approximation restricts the search space of possible approximations to those solutions that have an inherent upper bound on the rank. Ishteva et al. [48] investigate this approach in the context of structured low-rank approximation and propose the Structured Low-Rank Approximation by Factor-ization method (abbreviated in the following as SLRAbyF), which searches for the closest structured low-rank approximation in `2 sense. As observed among others by Chu et al.

[26] and Markovsky [61], there exists no general description of the topology of structured low-rank matrices and thus no viable approach that optimizes directly on the intersection of the two spaces. Using the concepts discussed in the previous section, however, structural constraints can be enforced on any low-rank approximation L = U Y by introducing the structural penalty term

1

mnkU Y −ΠS(U Y)k2F , (6.9)

which penalizes the residual error between a low-rank matrix and its projection onto the space of structured matrices. This residual, which is equivalent to the projectionΠS(U Y) onto the orthogonal complement ofS, vanishes only ifU Y fulfills the structural constraints of S.

While the structural constraint guarantees to find a structured low-rank matrix, a data fitting term is still required to ensure that the found approximation is close to the original data (cf. the discussion of Cadzow’s method [16] in [26]). In principle, a separable loss function that sums the entry-wise residual error between input X and structured low-rank approximationLcan be employed for this purpose. However, this does not take into account the number of positions at which the entries of the data vector appear in the full structured matrix. Thus, whenever some entries of the data vector appear more often in the structured matrix than others, the data fit is biased towards these entries unless additional weights are introduced. Another point is that whenever the inputX is already structured (i.e.X ∈ S), fittingLtoX over the whole coordinate set is unnecessarily more expensive than minimizing

6.2. Alternating minimization framework

the residual error based on the difference xl of the underlying data vectors. Therefore, the robust loss function (4.8) from Chapter 4 is replaced by

hµPxSvec (U Y), (6.10) which measures the discrepancy between the input data vector xand the least-squares fit to the entries of U Y, which for U Y ∈ S is the underlying data vector of the structured low-rank approximation. The residual is evaluated only on the index set with |Ω| ≤N, where N is the length of the data vector x. Ishteva et al. [48] propose to join the two constraints with an Augmented Lagrangian Multiplier (ALM) method [6].

The augmented Lagrangian function of a Robust Structured Low-Rank Approximation problem with the proposed smoothed`p-norm loss function writes as

L(U,Y,Λ) = hµ

PxSvec (U Y)+hΛ,mn1 (U YΠS(U Y))i+2mnρ kU Y −ΠS(U Y)k2F. (6.11) The general idea of the ALM scheme is to start with a small value for the parameter ρand to alternate between the optimization problems

[U]∈Grmink,mfU(U), min

YRk×n

fY (Y) and min

Λ∈Rm×n

fΛ(Λ) (6.12) with the separate cost functions defined as

fU : Grk,m→R, U 7→ L(U,Y0,Λ0), (6.13) fY :Rk×n→R, Y 7→ L(U0,Y,Λ0) and (6.14) fΛ:Rm×n→R, Λ7→ L(U0,Y0,Λ), (6.15) respectively, whereU0,Y0andΛ0describe intermediate estimates forU,Y andΛ, which are held constant during the optimization of other variables. After each iteration the parameter ρ is increased until the side condition holds up to a certain accuracy. While the simpler penalty method ensures the side condition only for ρ → ∞, the augmented Lagrangian multiplier allows to terminate the algorithm much sooner in practice [6].

The optimization of fU and fY are performed in the same way as for the unstructured low-rank approximation problem. Assuming that the input is fully observed, the respective gradients can be derived as

∇fU(U) = [−vec−1(S)>∇hµxSvec (U Y0)+

1

mn((Λ0ΠS(Λ0)) +ρ(U Y0ΠS(U Y0)))]Y0>

(6.16)

and

∇fY (Y) =U0> [−vec−1(S)>∇hµxSvec (U0Y)+

1

mn((Λ0ΠS0)) +ρ(U0YΠS(U0Y)))] (6.17) with the full derivation given in Appendix A.2. As for the unstructured case, missing observations appear as a zero entry in the gradient of the loss function.

Algorithm 6.1 outlines the proposed Grassmannian Robust SLRA approach, abbreviated asGRSLRA. The algorithm considers a partial observationP(x) of the data vector with denoting the observation set. Besides the data vector, the algorithm requires a description of the structure S. U is initialized randomly, and Y is initialized with all zeros. The weighting factor ρ is initialized sufficiently small (e.g. ρ = 1), so that the data fitting term is the dominant term at the beginning of the optimization and the approximation stays close to the input data with respect to the used distance measure. As proposed by Ishteva et al. [48], the optimization consists of an inner loop and an outer loop. In the inner loop, a low-rank approximation L=U Y is found by alternatingly optimizing over U and Y until the process converges to an intermediate solution. Subsequently, the Lagrangian Multiplier is updated with a single update step, ρ is increased and the process is repeated until ρ is large enough to guarantee that the structural side condition holds up to a certain accuracy.

The data vector is then obtained via the projection onto the structure S.

Apart from the added structural constraint, the inner low-rank approximation problem differs from the algorithm for the unstructured case in three main aspects: Firstly, the cost function is always evaluated over as subsampling the line search is neither applicable nor required due to the different nature of the residual error. Secondly, empirical results show that the parameter µ can be held constant during the approximation as ρ is altered whenever the inner loop converges to an intermediate solution. Thirdly, the criterion for

6.2. Alternating minimization framework

Algorithm 6.1 Alternating minimization scheme for Grassmannian Robust SLRA Input: P(x), structural constraints ofS

Choosecρ>1

InitializeU0,Y0, ρ=ρstart

whileρρend do whileδ > δmin do

U ←arg min

[U]∈Grk,mfY(Y) (6.13) Y ←arg min

YRk×n

fU(U) (6.14) end while

ΛΛmnρ (U Y −ΠS(U Y)) ρcρρ

end while Outputs:

ˆl=Svec (U Y)

convergence of the alternating minimization needs to be modified, as the second term in the Lagrangian funtion (6.11) may lead to a non-monotone decrease of the cost function.

Ishteva et al. [48] resolve this issue by observing the progress in the column space of the first factor instead, which is somewhat questionable as the SLRAbyF method does not consider orthogonal columns. For the GRSLRAmethod, on the other hand, the subspace angle

δ(i+1):=esubU(i),U(i+1) (6.18)

following the definition (4.18) is a meaningful measure due to the imposed orthogonality constraints. Following the recommendation of Ishteva et al. [48], cρ is adaptively chosen between (1.5,100) according to the iteration count of the inner loop, so that the overall number of iterations depends on the convergence speed of the inner loop. Typically, con-vergence is slow in the beginning (i.e. when the data fitting term dominates) and is fast whenever ρ becomes large. Thus, finding a good initial estimate forU and Y is crucial in order to speed up the algorithm. Certain structures allow to reuse previous estimates for initialization, as will be outlined in the following.