Isomap - Dimensionality reduction as an optimization problem

2.2 Dimensionality reduction as an optimization problem

2.2.3 Isomap

we have XH =X and thus, due to the cyclic invariance of the trace it holds arg min

P∈Rd×D P P T=Idd

tr(HX^TXH−HX^TP^TP XH) = arg min

P∈Rd×D P P T=Idd

tr(X^TX−X^TP^TP X)

= arg min

P∈Rd×D P P T=Idd

−tr(X^TP^TP X)

= arg min

P∈Rd×D P P T=Idd

−tr(P XX^TP^T).

In summary, the linear dimensionality reduction method MDS is given as arg min

P∈UM DS

g_{M DS}(P),

with cost functional g_{M DS}(P) = tr(HX^TXH −(P XH)^TP XH) and admissible set U_{M DS} = {P ∈ R^d^×^D: P P^T = Id_d}. By Theorem 2.11, the solution is given by the matrix containing row-wise the eigenvectors to the dlargest eigenvalues ofXHX^T. Remark 2.15 (Uniqueness). In analogy to PCA, the solution of the MDS problem is not unique. Moreover, all other properties of PCA are also true for MDS.

metric induced by the Riemannian metric (called the geodesic metric) on the underly-ing manifold instead of the Euclidean distance. As the manifold M is unknown, the computation of the geodesic distance is not possible but we can use the neighborhood structure of the data in order to approximate it. This neighborhood structure (see Step 1 on page 56) of the data induces a graph Γ = (X, E) on the data set, whose vertices x_i ∈X are the data points and whose edgese_ij = (x_i, x_j)∈E connect points which are in the same neighborhood, i.e.,e_ij ∈E ifx_j is a neighbor of x_i orx_i of x_j.

Now, we define the graph distance d_Γ of two points xi, xj ∈ X as follows: For a path γ = (x₀, x₁, . . . , x_m) connectingx_i=x₀ andx_j =x_m we define its length asd_γ(x_i, x_j) = P_m−1

i=0 kx_i−x_i+1k₂. Then, the graph distance of two points is defined as d_Γ(x_i, x_j) = min_γ∈Φdγ(xi, xj),where Φ is the set of all paths connecting xi andxj.

Remark2.16. If the data points are dense enough onM, the graph distance approximates the geodesic distance well (for a proof see Section 8.5 in [125]).

In analogy to the previous section we can define a dissimilarity matrix from the graph distance

D_Γ^X =(d^X_Γ)²_ij

i,j=1,...,n=d_Γ(x_i, x_j)²

i,j=1,...,n.

The idea is now to use this dissimilarity matrix as an input for MDS. This proceeding is called Isomap and was introduced by Tenenbaum and co-workers in 2000 (see [114]). The name Isomap refers toisometric mapping since the dimensionality reduction is realized by an isometric mappingP:M →R^d. Here, a mapping is called isometric if it preserves the pairwise distances of the data set.

Due to its construction Isomap strongly relies on MDS and is sometimes called the non-linear version of MDS.

The Isomap problem is given by minY

i,j=1

(d^X_Γ)²_ij −(d^Y_ij)², (2.15) compare (2.7). Remark that we do not assume that Y = P X for a linear P, so that Isomap is a non-linear dimensionality reduction method.

In the following we will specify the constraint set of the minimization problem (2.15).

Note thatD^X_Γ is not the squared Euclidean distance matrix and thus, Theorem 2.11 is not directly applicable and we need some further considerations.

First, we observe that forN sufficiently large (at mostn) there exists a configuration of pointsZ ∈ R^N^×ⁿ with D^Z =D^X_Γ as long as the so called Isomap kernel −¹₂HD^X_ΓH is positive-semidefinite. This can be seen as follows. A necessary condition forD_Γ^X to be a squared Euclidean distance matrix of a configuration inR^N is by equation (2.12)

−1

2HD_Γ^XH= (ZH)^TZH.

From this it can be directly seen that the left-hand side needs to be positive-semidefinite.

On the other hand, if it is positive-semidefinite, an eigendecomposition yields a desired

but centered configuration (compare [125]), i.e., ZH = Z. Moreover, we observe that the dimension N can be chosen as the rank of the Isomap kernel −¹₂HD_Γ^XH.

In the following we assume that −¹₂HD^X_ΓH is indeed positive-semidefinite (if it is not compare Remark 2.18) and that Z ∈ R^N×n is a corresponding configuration. Now, we apply the metric MDS from Section 2.2.2 to this data set Z in order to compute a low-dimensional representation Y =P Z of X. The Isomap problem then reads

min

P∈Rd×N P P T=Idd

i,j=1

(d^X_Γ)²_ij −(d^{P Z}_ij )² (2.16) and with Theorem 2.11 a solution is given by the singular value decomposition ofZH = VΣW^T asP =V_d^T. This yields the low-dimensional representationY =V_d^TZ.

From the practical point of view it is not necessary to compute Z because Y can be directly computed from the Isomap kernel. Consider the singular value decomposition of ZH = VΣW^T which yields the eigendecomposition −¹₂HD_Γ^XH = WΣ^TΣW^T = WΛ²W^T. Then, it follows

Y =V_d^TZ =V_d^TZH =V_d^TVΣW^T =Id_d×NΣW^T =Id_d×nΛW^T. Thus,Y can be directly obtained from the eigendecomposition of −¹₂HD^X_ΓH.

Remark2.17. The low-dimensional representation ofXis a centered data set even though X was not centered.

Remark 2.18. If the points are not distributed densely enough onM, the Isomap kernel might not be positive-semidefinite and thus,Y cannot be computed as described above.

To overcome this problem we can use the constant shift technique as in [125] which consists in adding a positive integer δ >0 to the graph distance d_Γ(x_i, x_j) for i6=j. It is possible to choose δ in a way that the resulting Isomap kernel is positive-semidefinite (see Chapter 8 in [125] for details).

Remark 2.19. Of course, other metrics beside the graph distance can be used to construct the dissimilarity matrix ofX approximating the geodesic metric. This will lead to other methods.

In summary, Isomap is based on the same cost functional as MDS. Only the starting distances are computed differently. Thus, we haveg_Isomap(P) =^Pⁿ_i,j=1(d^X_Γ)²_ij −(d^{P Z}_ij )² and the admissible set U_Isomap = {P ∈ R^d^×^N:P P^T = Id_d}. In contrast to MDS and PCA, when formulating Isomap as an optimization problem, we need to make an intermediate step where we compute the configuration Z.

To end this section we describe the Isomap algorithm.

Algorithm

As mentioned above, to solve the Isomap problem the graph metric d_Γ on the data set X is computed to approximate the geodesic metric of the underlying manifold before a linear dimensionality reduction method (here MDS or PCA) is applied in order to find

a low-dimensional representation preserving the graph metric. Since this computation involves more steps as solving the PCA or MDS problem, where a simple singular value decomposition is sufficient to compute the minimizer, we will outline the Isomap algo-rithm in the following steps (compare [125]).

Step 1. Definition of neighboring points. To fix the neighborhood structure of the data setX, we need to determine the neighboring points of a data pointx_i. To do so we can use either itsk-nearest neighbors or all points in an -neighborhood. Let us denote the set of neighboring points ofxi by N(i). Note that fork-nearest neighbors in general x_i ∈N(j) does not imply x_j ∈N(i).

Step 2. Computation of graph distance. As already explained, the neighborhood structure of X induces a graph Γ = (X, E) onX called the adjacency graph ofX. The graph metric onX is computed as the pairwise graph distanced_Γ(x_i, x_j) for each pair of points (x_i, x_j). If there are unconnected points, we set their distance to infinity. Define the dissimilarity matrixD^X_Γ by the graph distances.

Step 3. Construction of the Isomap kernel. Compute the Isomap kernel G^c =

−¹₂HD^X_ΓH. If it is not positive-semidefinite, change it according to Remark 2.18.

Step 4. Eigen decomposition of the kernel. Since G^c is positive-semidefinite it has an eigendecomposition WΛ²W^T, where Λ = diag(λ₁, . . . , λ_d, . . . , λ_n) with λ_i ≥ 0 ordered by size. The low-dimensional data setY is then given byY =Id_d_×_nΛW^T.

Im Dokument Non-Negative Dimensionality Reduction in Signal Separation (Seite 67-70)