Classical multidimensional scaling (CMDS) is a technique that embeds a set of objects in a Euclidean space given their pairwise Euclidean distances. The main part of CMDS involves double centering a squared distance matrix and using a truncated eigendecomposition to recover the point coordinates. In this paper, motivated by a study in Euclidean distance geometry, we explore a dual basis approach to CMDS. We give an explicit formula for the dual basis vectors and fully characterize the spectrum of an essential matrix in the dual basis framework. We make connections to a related problem in metric nearness.
more »
« less
How can classical multidimensional scaling go wrong?
Given a matrix D describing the pairwise dissimilarities of a data set, a common task is to embed the data points into Euclidean space. The classical multidimensional scaling (cMDS) algorithm is a widespread method to do this. However, theoretical analysis of the robustness of the algorithm and an in-depth analysis of its performance on non-Euclidean metrics is lacking. In this paper, we derive a formula, based on the eigenvalues of a matrix obtained from D, for the Frobenius norm of the difference between D and the metric Dcmds returned by cMDS. This error analysis leads us to the conclusion that when the derived matrix has a significant number of negative eigenvalues, then ∥D−Dcmds∥F, after initially decreasing, willeventually increase as we increase the dimension. Hence, counterintuitively, the quality of the embedding degrades as we increase the dimension. We empirically verify that the Frobenius norm increases as we increase the dimension for a variety of non-Euclidean metrics. We also show on several benchmark datasets that this degradation in the embedding results in the classification accuracy of both simple (e.g., 1-nearest neighbor) and complex (e.g., multi-layer neural nets) classifiers decreasing as we increase the embedding dimension.Finally, our analysis leads us to a new efficiently computable algorithm that returns a matrix Dl that is at least as close to the original distances as Dt (the Euclidean metric closest in ℓ2 distance). While Dl is not metric, when given as input to cMDS instead of D, it empirically results in solutions whose distance to D does not increase when we increase the dimension and the classification accuracy degrades less than the cMDS solution.
more »
« less
- Award ID(s):
- 1750780
- PAR ID:
- 10344286
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- ISSN:
- 1049-5258
- Page Range / eLocation ID:
- 12304--12315
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Banerjee, Arindam; Zhou, Zhi-Hua (Ed.)To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that is adapted to metric spaces, since graph sets are not Euclidean spaces. A standard approach is to consider the Fr ́echet mean. In this work, we equip a set of graph with the pseudometric defined by the l2 norm between the eigenvalues of their respective adjacency matrix. Unlike the edit distance, this pseudometric reveals structural changes at multiple scales, and is well adapted to studying various statistical problems for graph-valued data. We describe an algorithm to compute an approximation to the sample Fr ́echet mean of a set of undirected unweighted graphs with a fixed size using this pseudometric.more » « less
-
This article presents a new method to compute a self-intersection free high-dimensional Euclidean embedding (SIFHDE^2) for surfaces and volumes equipped with an arbitrary Riemannian metric. It is already known that given a high-dimensional (high-d) embedding, one can easily compute an anisotropic Voronoi diagram by back-mapping it to 3D space. We show here how to solve the inverse problem, i.e., given an input metric, compute a smooth intersection-free high-d embedding of the input such that the pullback metric of the embedding matches the input metric. Our numerical solution mechanism matches the deformation gradient of the 3D -> higher-d mapping with the given Riemannian metric. We demonstrate the applicability of our method, by using it to construct anisotropic Restricted Voronoi Diagram (RVD) and anisotropic meshing, that are otherwise extremely difficult to compute. In SIFHDE^2 -space constructed by our algorithm, difficult 3D anisotropic computations are replaced with simple Euclidean computations, resulting in an isotropic RVD and its dual mesh on this high-d embedding. Results are compared with the state-of-the-art in anisotropic surface and volume meshings using several examples and evaluation metrics.more » « less
-
Abstract We study a generalization of the classical multidimensional scaling procedure (cMDS) which is applicable in the setting of metric measure spaces. Metric measure spaces can be seen as natural ‘continuous limits’ of finite data sets. Given a metric measure space $${\mathcal{X}} = (X,d_{X},\mu _{X})$$, the generalized cMDS procedure involves studying an operator which may have infinite rank, a possibility which leads to studying its traceability. We establish that several continuous exemplar metric measure spaces such as spheres and tori (both with their respective geodesic metrics) induce traceable cMDS operators, a fact which allows us to obtain the complete characterization of the metrics induced by their resulting cMDS embeddings. To complement this, we also exhibit a metric measure space whose associated cMDS operator is not traceable. Finally, we establish the stability of the generalized cMDS method with respect to the Gromov–Wasserstein distance.more » « less
-
We introduce Non-Euclidean-MDS (Neuc-MDS), an extension of classical Multidimensional Scaling (MDS) that accommodates non-Euclidean and non-metric inputs. The main idea is to generalize the standard inner product to symmetric bilinear forms to utilize the negative eigenvalues of dissimilarity Gram matrices. Neuc-MDS efficiently optimizes the choice of (both positive and negative) eigenvalues of the dissimilarity Gram matrix to reduce STRESS, the sum of squared pairwise error. We provide an in-depth error analysis and proofs of the optimality in minimizing lower bounds of STRESS. We demonstrate Neuc-MDS’s ability to address limitations of classical MDS raised by prior research, and test it on various synthetic and real-world datasets in comparison with both linear and non-linear dimension reduction methods.more » « less
An official website of the United States government

