Multivariate Prony-type Methods - Other Multivariate Methods

3.3 Other Multivariate Methods

3.3.1 Multivariate Prony-type Methods

Now, proceeding similarly to the section on one dimensional methods, we can use these properties to describe the dierent approaches to the multivariate frequency estimation problem. We start with Prony's method. Instead of one shiftT, we now have one shift for each dimension.

Denition 3.33. Fork= 1, . . . , d, an f ∈ S_M^d and a nite setG⊂Z^d with Γ^d_M ⊂G, we dene the linear mapTk as the unique extension of

s^f_G(j)7→s^f_G(j+ek)

to Sig(f, G). Tk is called k-shift operator. The extension of Tk to C^G by zero on the orthogonal complement of the signal space is again denoted byTk.

Again, it is straight forward to check that the eigenvalues of T_k give the frequencies of f. The additional diculty is that the eigenvalues of T_k are the projection of the frequencies of f onto the subspace e_k·C^d. To match them, we use that T₁, . . . , T_d commute and hence have a common basis of eigenvectors. These eigenvectors induce a matching - one eigenvector corresponds todeigenvalues zj=e^2πiy^j ofTj and gives rise to the frequency vector(y1, . . . , yd).

Lemma 3.34. For f ∈ S_M^d , a set G⊂Z^d with Γ^d_M ⊂Gand the k-shift operator T_k dened above, the following statements hold true:

(1) For any left inverseL ofV_G(Y^f)we have that

T_k=V_G(Y^f)D_Yf(e_k)L, in particular T_k is well-dened.

3.3. OTHER MULTIVARIATE METHODS 67 (2) T1, . . . , Td commute.

(3) T_k has eigenvalues e^2πiy withy ∈e_k·Y^f. For one such y letY ={y˜∈Y^f : y=e_k·y}˜ . Then the geometric (and algebraic) multiplicity ofe^2πiy is given by |Y| and the eigenvectors are given byv_G(˜y), y˜∈Y.

Proof. The rst claim is clear due to Lemma 3.32 (1) and the fact thatD_Yf(e_k)D_Yf(j) =D_Yf(e_k+j), the second claim is obvious. Finally, for anyy˜∈Y we have that

TkvG(˜y) =VG(Y^f)D_Yf(ek)LvG(˜y) =VG(Y^f)D_Yf(ek)ey˜=e^2πi˜^y·e^kVG(Y^f)ey˜=e^2πiyvG(˜y).

This givesM linearly independent eigenvectors, which is the dimension ofSig(f, G).

Now all we have to do is to nd a basis ofSig(f, G), to representTj, j= 1, . . . , din this basis and to calculate a joint eigenbasis of the eigenspaces ofTj. Then, as described above, we have estimated Y^f. We are thus left with a little bit of linear algebra.

But before we describe the necessary linear algebra, we count the minimal number of sam-pling points we need. First of all we choose G = Γ^d_M (which is the minimal choice). By Lemma 3.32, s^f_G(k), k ∈ Γ^d_M form a spanning set of Sig(f, G). Therefore, Tj is uniquely determined by T_js^f_Γ_d

(k), k ∈ Γ^d_M. But we also need to know T_js^f_Γ_d

(k) =s^f_Γ_d

(k+e_j). All combined, we need to knowf onΓ^d_M+ (Γ^d_M+ej)to determineTj and forming the union over allj= 1, . . . , dgives precisely the sampling set described by Sauer [85].

Denition 3.35. We dene the corona of a set G⊂Z^d by

dGe=G∪

[

j=1

G+ej.

We immediately see that we need samples off onΓ^d_M+dΓ^d_Meto estimateTj, j= 1, . . . , d. We start by estimating Sig(f, G). Assume that we know N ∈ N>0, an upper bound of M, the unknown order off. We then collect a spanning set of the signal space and perform a singular value decomposition. Unfortunately, we have to leave the convenient notation of xing no enumeration of GanddΓ^d_Ne, as the SVD always xes an enumeration. Let

s^f_G(n) : n∈Γ^d_N

=:H_G,Γ^f d N

=UΣW^H.

We ignore the slight notational inaccuracy that the left-hand side is a matrix inC^G×Γ

N, the right-hand side inC^|G|×|Γ

N|. Estimating the rank of H^f

G,Γ^d_N by thresholding the singular values at a tol>0, we obtainM. The rstM columns ofU, denoted byu1, . . . , uM ∈C^|G|then form an orthogonal basis of Sig(f, G). Note that this estimate can also be applied when we only have noisy measurements.

But how to choose tol? As in the univariate case, H_G^f

1,G2 can be factorized in Vandermonde matrices and a diagonal matrix, which then gives rise to estimates ofσ_M.

Proposition 3.36. Let f ∈ S_M^d with frequencies Y^f, coecients c ∈ C^Y

f and order M be given.

Further, let G1, G2⊂Z^d be two nite sets. Then H_G^f

1,G₂ =VG₁(Y^f) diag cy : y∈Y^f

VG₂(Y^f)^T.

Further, if d = 2 andGj = [−Nj, Nj]²∩Z², j = 1,2 and f ∈ S²(q) with q ≥Kj/(Nj+ 1), where K1, K2, N1, N2∈N>0, the smallest non-zero singular value ofH_G^f

1,G2 can be estimated by σ²_min≥c²_minσ²_min(V_G₁(Y^f))σ_min² (V_G₂(Y^f))&(K₁K₂N₁N₂)⁻²,

where c_min is a lower bound to the modulus of the coecients of f. The precise constants are given in Proposition 2.29.

Proof. The factorization can be derived exactly as in the univariate case, using Lemma 3.32 and (3.22) (which is true for arbitrary nite sets):

s^f_G(n) : n∈G2

VG₁(Y^f)D_Yf(n)c : n∈G2

=VG₁(Y^f) diag cy : y∈Y^f

VG₂(Y^f)^T.

The lower bound forσ² follows directly from Proposition 2.29.

Remarks. 1. The factorization is well-known in the literature, see for example [85].

2. It is possible to obtain lower bounds for higher dimensions as well, if one relies on Montgomery's construction, see [18] Corollary 22. For xedq, this results in the following estimate:

σ_min² &qc²_min (2N₁)^d−(2N₁)^d−1+O(N₁^d−2)

(2N₂)^d−(2N₂)^d−1+O(N₂^d−2) . 3. Unfortunately, such estimates are unknown for sampling sets of the formG1= Γ^d_N,G2=dΓ^d_Ne. 4. The discussion after Theorem 3.7 carries over to the multivariate case with only slight adjust-ments. Indeed, if we are givenf˜(n) =f(n) +εn and we only know that |εn| ≤ η, we have to choose tol larger than

kEk2≤ηp

|G1||G2|,

whereE={εn+k : n∈G₁, k∈G₂} is the matrix containing the noise, to recoverordf =M from H_G^f

1,G2. However, that is only guaranteed to work if σM(H_G^f

1,G2) ≥ 2kEk2. As in the univariate case, more sophisticated estimates, using specic noise models and random matrix theory are currently unknown.

Next we consider the reduced singular value decomposition ofH_G,Γ^f d

N, which for readability is again denoted by UΣW^H, but now Σ ∈ C^M×M is a diagonal matrix with positive, decreasing diagonal entries,U ∈C^|G|×M andW ∈C^M×|Γ

N|with orthogonal columns.

Now we wish to obtain a matrix representation ofTj in the basisu1, . . . , uM. This results in M_j :=U^HT_jU =U^HT_jH_G,Γ^f _d

WΣ⁻¹=U^HH_G,Γ^f _d

N+e_jWΣ⁻¹∈C^M^×M. (3.23) We summarize the algorithm.

Algorithm 3.37 (Multivariate Prony's Method). Input: f(k), k∈G+dΓ^d_Neof an unknownf ∈ S_M^d , N≥M (i.e., an upper bound of the order off),G⊃Γ^d_M and tol≥0.

• Calculate an SVD of H_G,Γ^f d

N, let M be the number of singular values larger than tol. Save the reduced SVD H_G,Γ^f _d

=UΣW^H.

• Form the matricesM1, . . . , Md as in (3.23).

• Calculate a basis of joint eigenvectorsv1, . . . , vM ofMj and denote the eigenvalue ofMj andvk

bye^2πiy^k^j.

Output: The frequency vectors(y₁^k, . . . , y^k_d), k= 1, . . . , M.

How to calculate a basis of eigenvectors is well-known, see for example [33]. To obtain a joint eigenbasis, we usually do not need to calculate an eigenvalue decomposition of all Mj. Indeed, as the eigenspaces of one Mj are invariant under all other Mk, we simply start with an eigenspace decomposition ofM1. All one dimensional eigenspaces are then also eigenspaces of all other matrices.

For all higher dimensional eigenspaces, we proceed as follows: Let E be such an eigenspace ofM1. We then perform an eigenspace decomposition ofM2|E:E→E. For all eigenspaces ofM2inE with a dimension larger than one we continue by decomposingM3 on each of them et cetera. See [63] and [86], where this idea is made precise.

An alternative approach is to simply form a linear combination M of all matrices Mj. Clearly, such a linear combination has still the same eigenvector basis. The eigenvalues of M are then the corresponding linear combination of the eigenvalues of Mj. For all but a nite number of linear combinations, the eigenvalues ofM will only have one dimensional eigenspaces (an easy consequence of Lemma 3.34) and we are done. This strategy has been pursued in [81, 95]. Later, in our numerical experiments, we will use it as well.

Remarks. 1. It is easily possible to use a larger sampling set, for example G+dG₂e. As long as Γ^d_M ⊂G∩G₂the method will work.

3.3. OTHER MULTIVARIATE METHODS 69 2. If one uses noiseless samples and chooses tol = 0, the preceding discussion shows that the

algorithm will recoverY^f.

3. Concerning the computational complexity, calculating the SVD is the most costly step, namely O(|G||Γ^d_N|²). Using the minimal setG= Γ^d_M, this results inOd(N³), up to logarithmic terms.

4. In the caseG= [−N1, N1]^d∩Z^dandG2= [−N2, N2]^d∩Z^dAlgorithm 3.37 is actually the same as the algorithm proposed in [40]. However, not only is the number of samples increased to Od(N^d), the computational complexity increases drastically toOd(N^3d).

This variation of Prony's method is closely related to the method introduced by Sauer in [84].

Indeed, we claim that the matricesM_j, dened in (3.23) are similar to the transposed multiplication tables, as given in [84], Theorem 5. We now give a short reasoning for this claim.

To this end, we note that eachT_j can be seen as a dierence equation, as it gives rise to an equation of the form

n∈G

t^(j)_m,nf(n+k) =f(m+k+ej) for allm∈Gand allk∈Z^d

witht^(j)m,n∈C. In the one dimensional case, we were able to identify these coecients with a polyno-mial, with roots equal to the frequencies. To see whether this is still possible, we consider

P_m,j(z) =z^m+e^j−X

n∈G

t^(j)_m,nzⁿ∈Π_G∪{m+e_j_}. Now we see that

0 =f(m+k+ej)−X

n∈G

t^(j)_m,nf(n+k) = X

y∈Y^f

cye^2πiy·k e^2πiy·(m+e^j⁾−X

n∈G

t^(j)_m,ne^2πiy·n

= X

y∈Y^f

cye^2πiy·kPm,j e^2πiy .

As this equation holds for all k∈Z^d, we conclude thatP_m,j e^2πiy

= 0for ally∈Y^f. Further, note that for allm+e_j∈/G, clearlyP_m,j 6= 0and all theseP_m,j are linearly independent.

Denote the vanishing ideal ofY^f in the polynomial ringΠin dvariables by I_Yf =

p∈Π : p(y) = 0for ally∈Y^f .

Further, we denote by[p]the equivalence class ofp∈ΠmoduloI_Yf. We just proved thatPm,j∈I_Yf, i.e., [Pm,j] = 0. If we identifyC^G withΠG by

c∈C^G ↔X

k∈G

c_kz^k∈Π_G, we can dene

T˜_j: Π_G→Π/I_Yf, p7→[T_j^Tp].

Next we claim thatI_Yf ∩Π_Gis in the kernel of T˜_j. To prove this, we choose an arbitrary polynomial p= (p_n)_n∈G ∈I_Yf and calculate (using [P_m,j] = 0)

[T_j^Tp] =

n∈G

m∈G

t^(j)_m,npm

! zⁿ

m∈G

pm Pm,j(z) +X

n∈G

t^(j)_m,nzⁿ

m∈G

z_jp_mz^m

= [z_jp(z)] = 0.

For notational convenience, we continue to useT˜j for the mappingΠG/(I_Yf∩ΠG)→Π/I_Yf induced byT˜_j.

Furthermore, by interpolation onY^f, one can construct a mapping π: Π→ΠG/(I_Yf ∩ΠG).

Note that while interpolating onY^f is possible, the interpolant is not uniquely dened (see discussion after Lemma 3.23). However, it is unique moduloI_Yf. πcontainsI_Yf in its kernel, as it is constructed by interpolation.

Hence, we obtain a mapping (which is a linear mapping between vector spaces) T˜_j◦π: Π/I_Yf →Π/I_Yf.

We claim that this mapping is actually equal to[p]7→[zjp], the multiplication with the j-th variable.

It suces to check this on[z^m], m∈Γ^d_M, which is a spanning set. This, however, can be veried by a quick calculation:

T˜_j◦π([z^m]) = ˜T_j([z^m]) =

n∈G

t^(j)_m,nzⁿ

n∈G

t^(j)_m,nzⁿ+P_m,j

= [z^m+e^j] = [z_jz^m],

where we used that [P_m,j] = 0. It is interesting to note that in the one dimensional case, the shift operator T can be represented by the companion matrix (which represents [p] 7→ [zp] in Π/I_Yf).

Analogously, we just showed that T_j^T represents multiplication by the jth variable. A matrix rep-resentation ofT˜_j◦π can therefore be interpreted as a higher dimensional analog of the companion matrix. Finally, M_j can be seen as a matrix representing T_j in a suitable orthonormal basis. We summarize:

Proposition 3.38. The matricesM_j^T, j = 1, . . . , das given in (3.23) represent the linear mappings Π/I_Yf →Π/I_Yf

[p]7→[z_jp].

While this theorem shows that Algorithm 3.37 and the method proposed by Sauer are closely related, there are some important dierences. Most importantly, Algorithm 3.37 starts with an es-timate of the signal space, using as many samples as possible. Then projecting onto the eses-timated signal space (hopefully) clears most of the noise from the data. This gives a rather stable algo-rithm. On the other hand, the method introduced by Sauer uses a nested set of sampling points G+A0 ⊂G+A1⊂ · · · ⊂G+dΓ^d_Meand terminates at step k ifG+Ak suces to recover f. It is reasonable to assume that this leads to a method more prone to noise, as not always all samples are used. However, as manyf ∈ S_M^d can be estimated with fewer samples (in particular if the boundN of the order is crude), only using as many samples as necessary has computational advantages.

Another slightly dierent perspective is given in [40] by Harmouch, Khalil and Mourrain. Propo-sition 3.38 is closely related to PropoPropo-sition 4.1 in [40], which covers only the caseG= [−N1, N1]^d∩Z^d andG2 = [−N2, N2]^d∩Z^d, though a more general problem. They translate the problem to a poly-nomial setting (along the lines of the sketch given above) and then deduce Algorithm 3.37 for this case.

Finally, we remark that the rst result rephrasing the multivariate Prony problem as nding the joint zeros of a nite number of polynomials is given by Kunis et al. in [53]. However, no method to actually compute these zeros is given. In our version, we circumvent this formulation entirely.

Im Dokument Sparse Frequency Estimation : Stability and Algorithms (Seite 74-78)