A General View on Cost-Based Clustering - Algorithms for Optimal Transport and Wasserstein Dist

CauchyDensity ClassicImages Shapes

2.40e+09 2.45e+09 2.50e+09 2.55e+09 2.60e+09 2.65e+09

6.5e+08 7.0e+08 7.5e+08 8.0e+08

1.8e+09 1.9e+09 2.0e+09 2.1e+09

Image Class

Method Grid Random Values

_ _

Refined Cost Clustered Optimal Cost Original Optimal Cost

Figure 5.11: A subset of the same data as in Figure 5.10 limited ton = 256 for better visibility.

ening is, unsurprisingly, in favor of the coarsening, as aggregating nearby pixels seems to also work well in the context of cost-based clustering and the resulting cost values of the propagated solution are closest to the original optimal costs. In the Cauchy Density instance, the results for n = 16 are even better than the results of the clustering with random representatives for n= 64. This is further evidence that the standard aggregation of pixels works well in a multiscale scheme.

However, the results for the random clustering are not far off. In Figure 5.11 we see that the approximation quality of the grid-based coarsening is slightly better in the Cauchy Density and Classic Images instances, but this difference is negligible in the Shapes class. It is not surprising that the grid-based coarsening is the superior clustering method in grid-based instances, but the random clustering, a vastly more general method, replicates the performance reasonably well without relying on geometric information and shows promise to deliver a similar performance on instances without a metric. This opens up multiscale approaches to general optimal transport problems.

5.5 A General View on Cost-Based Cluster-ing

As mentioned before, clustering methods usually consider only one setX with some distance functiond, whereas we look at a cost function c: X×Y →R+

between two sets X and Y. This is a generalization, since one can choose X =Y andc=d, and a very flexible framework that can cater to different objectives. The gap objective functions we define in Section 5.2 are examples of different weightings of the entries of the gap matrix, but any other weighting can be used as an objective function for the cost-based clustering problem.

Moreover, the gap matrix can be replaced by a different notion of quality of clusters. Instead of the difference between the maximum and minimum entry of a submatrix one could use, for example, the mean absolute deviation of the values in the submatrix or other options.

Generalizations of the Clustering Algorithms The agglomerative clus-tering method is a general principle and can be applied to a wide range of clustering objectives. Algorithms 2 and 3 can be used as stated for other functions, provided they translate well into linkage criteria. In order to calculate the fusion matrix F we need to be able to compute increases in the objective functions for the fusion of any two clusters. It depends on the function, whether or not all of the entries in the fusion matrix change after agglomerating two clusters and to what extent. If this is the case, it might be necessary or helpful to use the slower Algorithm 3 instead of Algorithm 2.

The clustering method with random representatives can even more easily be generalized to other weightings of the gap matrix. Since Algorithm 4 generally constructs clusterings with small gap matrix entries instead of a concrete objective function, this algorithm can be directly applied. It can also be used for different cluster quality criteria, as long as small differences of costs in the same clusters are preferred, since this difference is the most important factor in the assignment decision. Otherwise, one has to adjust the decision step with respect to the quality criterion.

Examples of Geometric Clusterings Although cost-based clustering methods are not necessarily designed for Euclidean spaces, they can still be applied and yield interesting results other clustering methods would not find. The following examples are based on the Euclidean distance in R^D for various D and show that clusterings ofX are not limited to points that are

114 5.5. A GENERAL VIEW ON COST-BASED CLUSTERING

−4 −2 0 2 4

−4−2024

x x x x x x x x x x x x x x x x x x x x

Figure 5.12: A geometric example for cost-based clustering in three dimensions.

The setY consists of twenty points evenly spread on the line segment [−2,2]× {0} × {0} ⊆R³, while X consists of 200 samples from a standard multivariate normal distribution inR³. X is clustered into twenty subsets by the clustering algorithm with random representatives usingc(x_i, y_j) =kx_i−y_jk²₂. The figure shows a projection on the (x₂, x₃)-plane. The projection ofY is{(0,0)}, shown by a cross at the origin, and forX we only show clusters that contain points with a positive and a negative x₁-component as colored circles.

spatially close, but include features like spherical shapes or separated cluster parts. Consider the extreme case, where Y ={y} is a singleton. Then, X is clustered based on the distances of its points to y, which means that the clusters roughly form concentric spheres aroundy. Similarly, if Y is (roughly) supported on an affine subspace, X is divided into slices that are orthogonal to this subspace, and each slice has the same spherical features as in the singleton case.

This can be observed in Figure 5.12. Although it seems like the men-tioned example inR² where Y is a singleton, this example is actually three-dimensional, and the figure shows the projection onto the second and third

component. The setX contains samples from a standard multivariate normal distribution. In the clustering it is divided into multiple layers of clusters in x₁-direction. Figure 5.12 shows one layer around the (x₂, x₃)-plane, that is, all clusters containing both points with a positivex₁-value and points with a negativex₁-value. The points in this layer are roughly divided with respect to their distance to the origin, since the entries in the cost matrix of the points in X only depend on their x₁-coordinate as well as the distance to the origin in the (x₂, x₃)-plane.

In the example shown in Figure 5.13, Y is supported on a line in R² and the set X consists of samples of a bivariate normal distribution, roughly distributed on a second line orthogonal to the first (top left panel). After a rotation of the set X around the origin we observe different cluster shapes, as relation between X and Y changes (other panels). In the orthogonal case we see a division of X into slices near the origin, whereas clusters further away are divided into two parts each: one above and one below Y in a somewhat symmetrical fashion. For these points, similarity of rows in the cost matrix depends mainly on the distrance to the line. This effect decreases at different angles. At 60 degrees (top right) there still are non-connected clusters, whereas for 30 degrees and parallel lines (bottom panels) the clusters are mostly slice-structured.

Im Dokument Algorithms for Optimal Transport and Wasserstein Distances (Seite 120-123)