Given a matrix D describing the pairwise dissimilarities of a data set, a common task is to embed the data points into Euclidean space. The classical multidimensional scaling (cMDS) algorithm is a widespread method to do this. However, theoretical analysis of the robustness of the algorithm and an in-depth analysis of its performance on non-Euclidean metrics is lacking. In this paper, we derive a formula, based on the eigenvalues of a matrix obtained from D, for the Frobenius norm of the difference between D and the metric Dcmds returned by cMDS. This error analysis leads us to the conclusion that when the derived matrix has a significant number of negative eigenvalues, then ∥D−Dcmds∥F, after initially decreasing, willeventually increase as we increase the dimension. Hence, counterintuitively, the quality of the embedding degrades as we increase the dimension. We empirically verify that the Frobenius norm increases as we increase the dimension for a variety of non-Euclidean metrics. We also show on several benchmark datasets that this degradation in the embedding results in the classification accuracy of both simple (e.g., 1-nearest neighbor) and complex (e.g., multi-layer neural nets) classifiers decreasing as we increase the embedding dimension.Finally, our analysis leads us to a new efficiently computable algorithm that returns a matrix Dl that is at least as close to the original distances as Dt (the Euclidean metric closest in ℓ2 distance). While Dl is not metric, when given as input to cMDS instead of D, it empirically results in solutions whose distance to D does not increase when we increase the dimension and the classification accuracy degrades less than the cMDS solution.
more »
« less
A dual basis approach to multidimensional scaling
Classical multidimensional scaling (CMDS) is a technique that embeds a set of objects in a Euclidean space given their pairwise Euclidean distances. The main part of CMDS involves double centering a squared distance matrix and using a truncated eigendecomposition to recover the point coordinates. In this paper, motivated by a study in Euclidean distance geometry, we explore a dual basis approach to CMDS. We give an explicit formula for the dual basis vectors and fully characterize the spectrum of an essential matrix in the dual basis framework. We make connections to a related problem in metric nearness.
more »
« less
- Award ID(s):
- 2208392
- PAR ID:
- 10532111
- Publisher / Repository:
- Linear Algebra and its Application
- Date Published:
- Journal Name:
- Linear Algebra and its Applications
- Volume:
- 682
- Issue:
- C
- ISSN:
- 0024-3795
- Page Range / eLocation ID:
- 86 to 95
- Subject(s) / Keyword(s):
- Multidimensional scaling Distance geometry Dual basis Matrix nearness
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The Euclidean distance geometry (EDG) problem is a crucial machine learning task that appears in many applications. Utilizing the pairwise Euclidean distance information of a given point set, EDG reconstructs the configuration of the point system. When only partial distance information is available, matrix completion techniques can be incorporated to fill in the missing pairwise distances. In this paper, we propose a novel dual basis Riemannian gradient descent algorithm, coined RieEDG, for the EDG completion problem. The numerical experiments verify the effectiveness of the proposed algorithm. In particular, we show that RieEDG can precisely reconstruct various datasets consisting of 2- and 3-dimensional points by accessing a small fraction of pairwise distance information.more » « less
-
The Euclidean distance geometry (EDG) problem is a crucial machine learning task that appears in many applications. Utilizing the pairwise Euclidean distance information of a given point set, EDG reconstructs the configuration of the point system. When only partial distance information is available, matrix completion techniques can be incorporated to fill in the missing pairwise distances. In this paper, we propose a novel dual basis Riemannian gradient descent algorithm, coined RieEDG, for the EDG completion problem. The numerical experiments verify the effectiveness of the proposed algorithm. In particular, we show that RieEDG can precisely reconstruct various datasets consisting of 2- and 3-dimensional points by accessing a small fraction of pairwise distance information.more » « less
-
Abstract We study a generalization of the classical multidimensional scaling procedure (cMDS) which is applicable in the setting of metric measure spaces. Metric measure spaces can be seen as natural ‘continuous limits’ of finite data sets. Given a metric measure space $${\mathcal{X}} = (X,d_{X},\mu _{X})$$, the generalized cMDS procedure involves studying an operator which may have infinite rank, a possibility which leads to studying its traceability. We establish that several continuous exemplar metric measure spaces such as spheres and tori (both with their respective geodesic metrics) induce traceable cMDS operators, a fact which allows us to obtain the complete characterization of the metrics induced by their resulting cMDS embeddings. To complement this, we also exhibit a metric measure space whose associated cMDS operator is not traceable. Finally, we establish the stability of the generalized cMDS method with respect to the Gromov–Wasserstein distance.more » « less
-
We study the problem of determining the configuration of n points by using their distances to m nodes, referred to as anchor nodes. One sampling scheme is Nystrom sampling, which assumes known distances between the anchors and between the anchors and the n points, while the distances among the n points are unknown. For this scheme, a simple adaptation of the Nystrom method, which is often used for kernel approximation, is a viable technique to estimate the configuration of the anchors and the n points. In this manuscript, we propose a modified version of Nystrom sampling, where the distances from every node to one central node are known, but all other distances are incomplete. In this setting, the standard Nystrom approach is not applicable, necessitating an alternative technique to estimate the configuration of the anchors and the n points. We show that this problem can be framed as the recovery of a low-rank submatrix of a Gram matrix. Using synthetic and real data, we demonstrate that the proposed approach can exactly recover configurations of points given sufficient distance samples. This underscores that, in contrast to methods that rely on global sampling of distance matrices, the task of estimating the configuration of points can be done efficiently via structured sampling with well-chosen reliable anchors. Finally, our main analysis is grounded in a specific centering of the points. With this in mind, we extend previous work in Euclidean distance geometry by providing a general dual basis approach for points centered anywhere.more » « less
An official website of the United States government

