skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Largest Angle Path Distance for Multi-Manifold Clustering
We propose a novel, angle-based path metric for the multi-manifold clustering problem. This metric, which we call the largest-angle path distance (LAPD), is computed as a bottleneck path distance in a graph constructed on d-simplices of data points. When data is sampled from a collection of d-dimensional manifolds which may intersect, the method can cluster the manifolds with high accuracy and automatically detect how many manifolds are present. By leveraging fast approximation schemes for bottleneck distance, this method exhibits quasi-linear computational complexity in the number of data points. In addition to being highly scalable, the method outperforms existing algorithms in numerous numerical experiments on intersecting manifolds, and exhibits robustness with respect to noise and curvature in the data.  more » « less
Award ID(s):
2309570 2136198
PAR ID:
10509378
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
International Conference on Sampling Theory and Applications
ISSN:
2694-0108
ISBN:
979-8-3503-2885-1
Page Range / eLocation ID:
1 to 7
Format(s):
Medium: X Size: 1.6MB Other: pdf
Size(s):
1.6MB
Location:
New Haven, CT, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Mulzer, Wolfgang; Phillips, Jeff M (Ed.)
    Metric spaces (X, d) are ubiquitous objects in mathematics and computer science that allow for capturing pairwise distance relationships d(x, y) between points x, y ∈ X. Because of this, it is natural to ask what useful generalizations there are of metric spaces for capturing "k-wise distance relationships" d(x_1, …, x_k) among points x_1, …, x_k ∈ X for k > 2. To that end, Gähler (Math. Nachr., 1963) (and perhaps others even earlier) defined k-metric spaces, which generalize metric spaces, and most notably generalize the triangle inequality d(x₁, x₂) ≤ d(x₁, y) + d(y, x₂) to the "simplex inequality" d(x_1, …, x_k) ≤ ∑_{i=1}^k d(x_1, …, x_{i-1}, y, x_{i+1}, …, x_k). (The definition holds for any fixed k ≥ 2, and a 2-metric space is just a (standard) metric space.) In this work, we introduce strong k-metric spaces, k-metric spaces that satisfy a topological condition stronger than the simplex inequality, which makes them "behave nicely." We also introduce coboundary k-metrics, which generalize 𝓁_p metrics (and in fact all finite metric spaces induced by norms) and minimum bounding chain k-metrics, which generalize shortest path metrics (and capture all strong k-metrics). Using these definitions, we prove analogs of a number of fundamental results about embedding finite metric spaces including Fréchet embedding (isometric embedding into 𝓁_∞) and isometric embedding of all tree metrics into 𝓁₁. We also study relationships between families of (strong) k-metrics, and show that natural quantities, like simplex volume, are strong k-metrics. 
    more » « less
  2. A bstract We formulate a series of conjectures relating the geometry of conformal manifolds to the spectrum of local operators in conformal field theories in d > 2 spacetime dimensions. We focus on conformal manifolds with limiting points at infinite distance with respect to the Zamolodchikov metric. Our central conjecture is that all theories at infinite distance possess an emergent higher-spin symmetry, generated by an infinite tower of currents whose anomalous dimensions vanish exponentially in the distance. Stated geometrically, the diameter of a non-compact conformal manifold must diverge logarithmically in the higher-spin gap. In the holographic context our conjectures are related to the Distance Conjecture in the swampland program. Interpreted gravitationally, they imply that approaching infinite distance in moduli space at fixed AdS radius, a tower of higher-spin fields becomes massless at an exponential rate that is bounded from below in Planck units. We discuss further implications for conformal manifolds of superconformal field theories in three and four dimensions. 
    more » « less
  3. Motivated by applications to classification problems on metric data, we study Weighted Metric Clustering problem: given a metric d over n points and a k x k symmetric matrix A with non-negative entries, the goal is to find a k-partition of these points into clusters C1,...,Ck, while minimizing the sum of A[i,j] * d(u,v) over all pairs of clusters Ci and Cj and all pairs of points u from Ci and v from Cj. Specific choices of A lead to Weighted Metric Clustering capturing well-studied graph partitioning problems in metric spaces, such as Min-Uncut, Min-k-Sum, Min-k-Cut, and more.Our main result is that Weighted Metric Clustering admits a polynomial-time approximation scheme (PTAS). Our algorithm handles all the above problems using the Sherali-Adams linear programming relaxation. This subsumes several prior works, unifies many of the techniques for various metric clustering objectives, and yields a PTAS for several new problems, including metric clustering on manifolds and a new family of hierarchical clustering objectives. Our experiments on the hierarchical clustering objective show that it better captures the ground-truth structural information compared to the popular Dasgupta's objective. 
    more » « less
  4. Abstract Dynamical formulations of optimal transport (OT) frame the task of comparing distributions as a variational problem which searches for a path between distributions minimizing a kinetic energy functional. In applications, it is frequently natural to require paths of distributions to satisfy additional conditions. Inspired by this, we introduce a model for dynamical OT which incorporates constraints on the space of admissible paths into the framework of unbalanced OT, where the source and target measures are allowed to have a different total mass. Our main results establish, for several general families of constraints, the existence of solutions to the variational problem which defines this path constrained unbalanced OT framework. These results are primarily concerned with distributions defined on an Euclidean space, but we extend them to distributions defined over parallelizable Riemannian manifolds as well. We also consider metric properties of our framework, showing that, for certain types of constraints, our model defines a metric on the relevant space of distributions. This metric is shown to arise as a geodesic distance of a Riemannian metric, obtained through an analogue of Otto’s submersion in the classical OT setting. 
    more » « less
  5. The information bottleneck (IB) approach to clustering takes a joint distribution [Formula: see text] and maps the data [Formula: see text] to cluster labels [Formula: see text], which retain maximal information about [Formula: see text] (Tishby, Pereira, & Bialek, 1999 ). This objective results in an algorithm that clusters data points based on the similarity of their conditional distributions [Formula: see text]. This is in contrast to classic geometric clustering algorithms such as [Formula: see text]-means and gaussian mixture models (GMMs), which take a set of observed data points [Formula: see text] and cluster them based on their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse & Schwab, 2017 ), a variant of IB, to perform geometric clustering by choosing cluster labels that preserve information about data point location on a smoothed data set. We also introduce a novel intuitive method to choose the number of clusters via kinks in the information curve. We apply this approach to a variety of simple clustering problems, showing that DIB with our model selection procedure recovers the generative cluster labels. We also show that, in particular limits of our model parameters, clustering with DIB and IB is equivalent to [Formula: see text]-means and EM fitting of a GMM with hard and soft assignments, respectively. Thus, clustering with (D)IB generalizes and provides an information-theoretic perspective on these classic algorithms. 
    more » « less