• Keine Ergebnisse gefunden

2.2 Dimensionality reduction as an optimization problem

2.2.3 Isomap

we have XH =X and thus, due to the cyclic invariance of the trace it holds arg min

P∈Rd×D P P T=Idd

tr(HXTXHHXTPTP XH) = arg min

P∈Rd×D P P T=Idd

tr(XTXXTPTP X)

= arg min

P∈Rd×D P P T=Idd

−tr(XTPTP X)

= arg min

P∈Rd×D P P T=Idd

−tr(P XXTPT).

In summary, the linear dimensionality reduction method MDS is given as arg min

P∈UM DS

gM DS(P),

with cost functional gM DS(P) = tr(HXTXH −(P XH)TP XH) and admissible set UM DS = {P ∈ Rd×D: P PT = Idd}. By Theorem 2.11, the solution is given by the matrix containing row-wise the eigenvectors to the dlargest eigenvalues ofXHXT. Remark 2.15 (Uniqueness). In analogy to PCA, the solution of the MDS problem is not unique. Moreover, all other properties of PCA are also true for MDS.

metric induced by the Riemannian metric (called the geodesic metric) on the underly-ing manifold instead of the Euclidean distance. As the manifold M is unknown, the computation of the geodesic distance is not possible but we can use the neighborhood structure of the data in order to approximate it. This neighborhood structure (see Step 1 on page 56) of the data induces a graph Γ = (X, E) on the data set, whose vertices xiX are the data points and whose edgeseij = (xi, xj)∈E connect points which are in the same neighborhood, i.e.,eijE ifxj is a neighbor of xi orxi of xj.

Now, we define the graph distance dΓ of two points xi, xjX as follows: For a path γ = (x0, x1, . . . , xm) connectingxi=x0 andxj =xm we define its length asdγ(xi, xj) = Pm−1

i=0 kxixi+1k2. Then, the graph distance of two points is defined as dΓ(xi, xj) = minγ∈Φdγ(xi, xj),where Φ is the set of all paths connecting xi andxj.

Remark2.16. If the data points are dense enough onM, the graph distance approximates the geodesic distance well (for a proof see Section 8.5 in [125]).

In analogy to the previous section we can define a dissimilarity matrix from the graph distance

DΓX =(dXΓ)2ij

i,j=1,...,n=dΓ(xi, xj)2

i,j=1,...,n.

The idea is now to use this dissimilarity matrix as an input for MDS. This proceeding is called Isomap and was introduced by Tenenbaum and co-workers in 2000 (see [114]). The name Isomap refers toisometric mapping since the dimensionality reduction is realized by an isometric mappingP:M →Rd. Here, a mapping is called isometric if it preserves the pairwise distances of the data set.

Due to its construction Isomap strongly relies on MDS and is sometimes called the non-linear version of MDS.

The Isomap problem is given by minY

n

X

i,j=1

(dXΓ)2ij −(dYij)2, (2.15) compare (2.7). Remark that we do not assume that Y = P X for a linear P, so that Isomap is a non-linear dimensionality reduction method.

In the following we will specify the constraint set of the minimization problem (2.15).

Note thatDXΓ is not the squared Euclidean distance matrix and thus, Theorem 2.11 is not directly applicable and we need some further considerations.

First, we observe that forN sufficiently large (at mostn) there exists a configuration of pointsZ ∈ RN×n with DZ =DXΓ as long as the so called Isomap kernel12HDXΓH is positive-semidefinite. This can be seen as follows. A necessary condition forDΓX to be a squared Euclidean distance matrix of a configuration inRN is by equation (2.12)

−1

2HDΓXH= (ZH)TZH.

From this it can be directly seen that the left-hand side needs to be positive-semidefinite.

On the other hand, if it is positive-semidefinite, an eigendecomposition yields a desired

but centered configuration (compare [125]), i.e., ZH = Z. Moreover, we observe that the dimension N can be chosen as the rank of the Isomap kernel −12HDΓXH.

In the following we assume that −12HDXΓH is indeed positive-semidefinite (if it is not compare Remark 2.18) and that Z ∈ RN×n is a corresponding configuration. Now, we apply the metric MDS from Section 2.2.2 to this data set Z in order to compute a low-dimensional representation Y =P Z of X. The Isomap problem then reads

min

PRd×N P P T=Idd

n

X

i,j=1

(dXΓ)2ij −(dP Zij )2 (2.16) and with Theorem 2.11 a solution is given by the singular value decomposition ofZH = VΣWT asP =VdT. This yields the low-dimensional representationY =VdTZ.

From the practical point of view it is not necessary to compute Z because Y can be directly computed from the Isomap kernel. Consider the singular value decomposition of ZH = VΣWT which yields the eigendecomposition −12HDΓXH = WΣTΣWT = WΛ2WT. Then, it follows

Y =VdTZ =VdTZH =VdTVΣWT =Idd×NΣWT =Idd×nΛWT. Thus,Y can be directly obtained from the eigendecomposition of −12HDXΓH.

Remark2.17. The low-dimensional representation ofXis a centered data set even though X was not centered.

Remark 2.18. If the points are not distributed densely enough onM, the Isomap kernel might not be positive-semidefinite and thus,Y cannot be computed as described above.

To overcome this problem we can use the constant shift technique as in [125] which consists in adding a positive integer δ >0 to the graph distance dΓ(xi, xj) for i6=j. It is possible to choose δ in a way that the resulting Isomap kernel is positive-semidefinite (see Chapter 8 in [125] for details).

Remark 2.19. Of course, other metrics beside the graph distance can be used to construct the dissimilarity matrix ofX approximating the geodesic metric. This will lead to other methods.

In summary, Isomap is based on the same cost functional as MDS. Only the starting distances are computed differently. Thus, we havegIsomap(P) =Pni,j=1(dXΓ)2ij −(dP Zij )2 and the admissible set UIsomap = {P ∈ Rd×N:P PT = Idd}. In contrast to MDS and PCA, when formulating Isomap as an optimization problem, we need to make an intermediate step where we compute the configuration Z.

To end this section we describe the Isomap algorithm.

Algorithm

As mentioned above, to solve the Isomap problem the graph metric dΓ on the data set X is computed to approximate the geodesic metric of the underlying manifold before a linear dimensionality reduction method (here MDS or PCA) is applied in order to find

a low-dimensional representation preserving the graph metric. Since this computation involves more steps as solving the PCA or MDS problem, where a simple singular value decomposition is sufficient to compute the minimizer, we will outline the Isomap algo-rithm in the following steps (compare [125]).

Step 1. Definition of neighboring points. To fix the neighborhood structure of the data setX, we need to determine the neighboring points of a data pointxi. To do so we can use either itsk-nearest neighbors or all points in an -neighborhood. Let us denote the set of neighboring points ofxi by N(i). Note that fork-nearest neighbors in general xiN(j) does not imply xjN(i).

Step 2. Computation of graph distance. As already explained, the neighborhood structure of X induces a graph Γ = (X, E) onX called the adjacency graph ofX. The graph metric onX is computed as the pairwise graph distancedΓ(xi, xj) for each pair of points (xi, xj). If there are unconnected points, we set their distance to infinity. Define the dissimilarity matrixDXΓ by the graph distances.

Step 3. Construction of the Isomap kernel. Compute the Isomap kernel Gc =

12HDXΓH. If it is not positive-semidefinite, change it according to Remark 2.18.

Step 4. Eigen decomposition of the kernel. Since Gc is positive-semidefinite it has an eigendecomposition WΛ2WT, where Λ = diag(λ1, . . . , λd, . . . , λn) with λi ≥ 0 ordered by size. The low-dimensional data setY is then given byY =Idd×nΛWT.