The Euclidean distance geometry (EDG) problem is a crucial machine learning task that appears in many applications. Utilizing the pairwise Euclidean distance information of a given point set, EDG reconstructs the configuration of the point system. When only partial distance information is available, matrix completion techniques can be incorporated to fill in the missing pairwise distances. In this paper, we propose a novel dual basis Riemannian gradient descent algorithm, coined RieEDG, for the EDG completion problem. The numerical experiments verify the effectiveness of the proposed algorithm. In particular, we show that RieEDG can precisely reconstruct various datasets consisting of 2- and 3-dimensional points by accessing a small fraction of pairwise distance information.
more »
« less
Riemannian Optimization for Euclidean Distance Geometry
The Euclidean distance geometry (EDG) problem is a crucial machine learning task that appears in many applications. Utilizing the pairwise Euclidean distance information of a given point set, EDG reconstructs the configuration of the point system. When only partial distance information is available, matrix completion techniques can be incorporated to fill in the missing pairwise distances. In this paper, we propose a novel dual basis Riemannian gradient descent algorithm, coined RieEDG, for the EDG completion problem. The numerical experiments verify the effectiveness of the proposed algorithm. In particular, we show that RieEDG can precisely reconstruct various datasets consisting of 2- and 3-dimensional points by accessing a small fraction of pairwise distance information.
more »
« less
- Award ID(s):
- 2304489
- PAR ID:
- 10536193
- Publisher / Repository:
- 15th Annual Workshop on Optimization for Machine Learning
- Date Published:
- Format(s):
- Medium: X
- Location:
- New Orleans, LA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The problem of finding suitable point embedding or geometric configurations given only Euclidean distance information of point pairs arises both as a core task and as a sub-problem in a variety of machine learning applications. In this paper, we aim to solve this problem given a minimal number of distance samples. To this end, we leverage continuous and non-convex rank minimization formulations of the problem and establish a local convergence guarantee for a variant of iteratively reweighted least squares (IRLS), which applies if a minimal random set of observed distances is provided. As a technical tool, we establish a restricted isometry property (RIP) restricted to a tangent space of the manifold of symmetric rank- matrices given random Euclidean distance measurements, which might be of independent interest for the analysis of other non-convex approaches. Furthermore, we assess data efficiency, scalability and generalizability of different reconstruction algorithms through numerical experiments with simulated data as well as real-world data, demonstrating the proposed algorithm's ability to identify the underlying geometry from fewer distance samples compared to the state-of-the-art. The Matlab code can be found at https://github.com/ipsita-ghosh-1/EDG-IRLS.more » « less
-
Classical multidimensional scaling (CMDS) is a technique that embeds a set of objects in a Euclidean space given their pairwise Euclidean distances. The main part of CMDS involves double centering a squared distance matrix and using a truncated eigendecomposition to recover the point coordinates. In this paper, motivated by a study in Euclidean distance geometry, we explore a dual basis approach to CMDS. We give an explicit formula for the dual basis vectors and fully characterize the spectrum of an essential matrix in the dual basis framework. We make connections to a related problem in metric nearness.more » « less
-
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $$\ell_\infty$$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $$\ell_2$$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.more » « less
-
null (Ed.)The 2-Wasserstein distance (or RMS distance) is a useful measure of similarity between probability distributions with exciting applications in machine learning. For discrete distributions, the problem of computing this distance can be expressed in terms of finding a minimum-cost perfect matching on a complete bipartite graph given by two multisets of points A, B ⊂ ℝ2, with |A| = |B| = n, where the ground distance between any two points is the squared Euclidean distance between them. Although there is a near-linear time relative ∊-approximation algorithm for the case where the ground distance is Euclidean (Sharathkumar and Agarwal, JACM 2020), all existing relative ∊-approximation algorithms for the RMS distance take Ω(n3/2) time. This is primarily because, unlike Euclidean distance, squared Euclidean distance is not a metric. In this paper, for the RMS distance, we present a new ∊-approximation algorithm that runs in O(n^5/4 poly{log n, 1/∊}) time. Our algorithm is inspired by a recent approach for finding a minimum-cost perfect matching in bipartite planar graphs (Asathulla et al, TALG 2020). Their algorithm depends heavily on the existence of sublinear sized vertex separators as well as shortest path data structures that require planarity. Surprisingly, we are able to design a similar algorithm for a complete geometric graph that is far from planar and does not have any vertex separators. Central components of our algorithm include a quadtree-based distance that approximates the squared Euclidean distance and a data structure that supports both Hungarian search and augmentation in sublinear time.more » « less
An official website of the United States government

