This content will become publicly available on June 30, 2025
Title: Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms
We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, where they are induced by geodesics under a density-distorted Riemannian metric. We prove that discrete, sample-based Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data and the parameter governing the extent of density weighting in Fermat distances. This is done by leveraging novel geometric and statistical arguments in percolation theory that allow for non-uniform densities and curved domains. Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators. In particular, we show the discrete eigenvalues and eigenvectors converge to their continuum analogues at a dimension-dependent rate, which allows us to interpret the efficacy of discrete spectral clustering using Fermat distances in terms of the resulting continuum limit. The perspective afforded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data and related insights into efficient computations associated to density-driven spectral clustering. Our theoretical analysis is supported with numerical simulations and experiments on synthetic and real image data. more »« less
Craig, Katy; Garcia Trillos, Nicolas; Slepcev, Dejan
(, Modeling and simulation in science engineering technology)
Bellomo, N.; Carrillo, J.A.; Tadmor, E.
(Ed.)
In this work, we build a unifying framework to interpolate between density-driven and geometry-based algorithms for data clustering and, specifically, to connect the mean shift algorithm with spectral clustering at discrete and continuum levels. We seek this connection through the introduction of Fokker–Planck equations on data graphs. Besides introducing new forms of mean shift algorithms on graphs, we provide new theoretical insights on the behavior of the family of diffusion maps in the large sample limit as well as provide new connections between diffusion maps and mean shift dynamics on a fixed graph. Several numerical examples illustrate our theoretical findings and highlight the benefits of interpolating density-driven and geometry-based clustering algorithms.
Maggioni, Mauro; Murphy, James M.
(, Journal of machine learning research)
This paper proposes and analyzes a novel clustering algorithm, called \emph{learning by unsupervised nonlinear diffusion (LUND)}, that combines graph-based diffusion geometry with techniques based on density and mode estimation. LUND is suitable for data generated from mixtures of distributions with densities that are both multimodal and supported near nonlinear sets. A crucial aspect of this algorithm is the use of time of a data-adapted diffusion process, and associated diffusion distances, as a scale parameter that is different from the local spatial scale parameter used in many clustering algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. These structures may be missed by the top eigenvectors of the graph Laplacian, commonly used in spectral clustering. This analysis is leveraged to prove sufficient conditions guaranteeing the accuracy of LUND. We implement LUND and confirm its theoretical properties on illustrative data sets, demonstrating its theoretical and empirical advantages over both spectral and density-based clustering.
Esposito, Antonio; Patacchini, Francesco S.; Schlichting, André; Slepčev, Dejan
(, Archive for Rational Mechanics and Analysis)
null
(Ed.)
Abstract We consider dynamics driven by interaction energies on graphs. We introduce graph analogues of the continuum nonlocal-interaction equation and interpret them as gradient flows with respect to a graph Wasserstein distance. The particular Wasserstein distance we consider arises from the graph analogue of the Benamou–Brenier formulation where the graph continuity equation uses an upwind interpolation to define the density along the edges. While this approach has both theoretical and computational advantages, the resulting distance is only a quasi-metric. We investigate this quasi-metric both on graphs and on more general structures where the set of “vertices” is an arbitrary positive measure. We call the resulting gradient flow of the nonlocal-interaction energy the nonlocal nonlocal-interaction equation (NL $$^2$$ 2 IE). We develop the existence theory for the solutions of the NL $$^2$$ 2 IE as curves of maximal slope with respect to the upwind Wasserstein quasi-metric. Furthermore, we show that the solutions of the NL $$^2$$ 2 IE on graphs converge as the empirical measures of the set of vertices converge weakly, which establishes a valuable discrete-to-continuum convergence result.
de Hoop, Maarten V.; Ilmavirta, Joonas; Lassas, Matti; Saksala, Teemu
(, Inverse Problems)
Abstract Consider the geometric inverse problem: there is a set of delta-sources in spacetime that emit waves travelling at unit speed. If we know all the arrival times at the boundary cylinder of the spacetime, can we reconstruct the space, a Riemannian manifold with boundary? With a finite set of sources we can only hope to get an approximate reconstruction, and we indeed provide a discrete metric approximation to the manifold with explicit data-driven error bounds when the manifold is simple. This is the geometrization of a seismological inverse problem where we measure the arrival times on the surface of waves from an unknown number of unknown interior microseismic events at unknown times. The closeness of two metric spaces with a marked boundary is measured by a labeled Gromov–Hausdorff distance. If measurements are done for infinite time and spatially dense sources, our construction produces the true Riemannian manifold and the finite-time approximations converge to it in the metric sense
The scattering transform is a multilayered, wavelet-based transform initially introduced as a mathematical model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks’ stability and invariance properties. In subsequent years, there has been widespread interest in extending the success of CNNs to data sets with non- Euclidean structure, such as graphs and manifolds, leading to the emerging field of geometric deep learning. In order to improve our understanding of the architectures used in this new field, several papers have proposed generalizations of the scattering transform for non-Euclidean data structures such as undirected graphs and compact Riemannian manifolds without boundary. Analogous to the original scattering transform, these works prove that these variants of the scattering transform have desirable stability and invariance properties and aim to improve our understanding of the neural networks used in geometric deep learning. In this paper, we introduce a general, unified model for geometric scattering on measure spaces. Our proposed framework includes previous work on compact Riemannian manifolds without boundary and undirected graphs as special cases but also applies to more general settings such as directed graphs, signed graphs, and manifolds with boundary. We propose a new criterion that identifies to which groups a useful representation should be invariant and show that this criterion is sufficient to guarantee that the scattering transform has desirable stability and invariance properties. Additionally, we consider finite measure spaces that are obtained from randomly sampling an unknown manifold. We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold. Moreover, we use a diffusion-maps based approach to prove quantitative estimates on the rate of convergence of one of these approximations as the number of sample points tends to infinity. Lastly, we showcase the utility of our method on spherical images, a directed graph stochastic block model, and on high-dimensional single-cell data.
Garcia_Trillos, Nicolas, Little, Anna, McKenzie, Daniel, and Murphy, James M. Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms. Retrieved from https://par.nsf.gov/biblio/10527983. Journal of machine learning research .
Garcia_Trillos, Nicolas, Little, Anna, McKenzie, Daniel, and Murphy, James M.
"Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms". Journal of machine learning research (). Country unknown/Code not available: Journal of Machine Learning Research. https://par.nsf.gov/biblio/10527983.
@article{osti_10527983,
place = {Country unknown/Code not available},
title = {Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms},
url = {https://par.nsf.gov/biblio/10527983},
abstractNote = {We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, where they are induced by geodesics under a density-distorted Riemannian metric. We prove that discrete, sample-based Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data and the parameter governing the extent of density weighting in Fermat distances. This is done by leveraging novel geometric and statistical arguments in percolation theory that allow for non-uniform densities and curved domains. Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators. In particular, we show the discrete eigenvalues and eigenvectors converge to their continuum analogues at a dimension-dependent rate, which allows us to interpret the efficacy of discrete spectral clustering using Fermat distances in terms of the resulting continuum limit. The perspective afforded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data and related insights into efficient computations associated to density-driven spectral clustering. Our theoretical analysis is supported with numerical simulations and experiments on synthetic and real image data.},
journal = {Journal of machine learning research},
publisher = {Journal of Machine Learning Research},
author = {Garcia_Trillos, Nicolas and Little, Anna and McKenzie, Daniel and Murphy, James M},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.