skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Local Regularization of Noisy Point Clouds: Improved Global Geometric Estimates and Data Analysis
Several data analysis techniques employ similarity relationships between data points to uncover the intrinsic dimension and geometric structure of the underlying data-generating mechanism. In this paper we work under the model assumption that the data is made of random perturbations of feature vectors lying on a low-dimensional manifold. We study two questions: how to define the similarity relationships over noisy data points, and what is the resulting impact of the choice of similarity in the extraction of global geometric information from the underlying manifold. We provide concrete mathematical evidence that using a local regularization of the noisy data to define the similarity improves the ap- proximation of the hidden Euclidean distance between unperturbed points. Furthermore, graph-based objects constructed with the locally regularized similarity function satisfy bet- ter error bounds in their recovery of global geometric ones. Our theory is supported by numerical experiments that demonstrate that the gain in geometric understanding facili- tated by local regularization translates into a gain in classification accuracy in simulated and real data.  more » « less
Award ID(s):
1912818
PAR ID:
10155932
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
20
ISSN:
1532-4435
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Several data analysis techniques employ similarity relationships between data points to uncover the intrinsic dimension and geometric structure of the underlying data-generating mechanism. In this paper we work under the model assumption that the data is made of random perturbations of feature vectors lying on a low-dimensional manifold. We study two questions: how to define the similarity relationships over noisy data points, and what is the resulting impact of the choice of similarity in the extraction of global geometric information from the underlying manifold. We provide concrete mathematical evidence that using a local regularization of the noisy data to define the similarity improves the approximation of the hidden Euclidean distance between unperturbed points. Furthermore, graph-based objects constructed with the locally regularized similarity function satisfy better error bounds in their recovery of global geometric ones. Our theory is supported by numerical experiments that demonstrate that the gain in geometric understanding facilitated by local regularization translates into a gain in classification accuracy in simulated and real data. 
    more » « less
  2. This article introduces an isometric manifold embedding data-driven paradigm designed to enable model-free simulations with noisy data sampled from a constitutive manifold. The proposed data-driven approach iterates between a global optimization problem that seeks admissible solutions for the balance principle and a local optimization problem that finds the closest point projection of the Euclidean space that isometrically embeds a nonlinear constitutive manifold. To de-noise the database, a geometric autoencoder is introduced such that the encoder first learns to create an approximated embedding that maps the underlying low-dimensional structure of the high-dimensional constitutive manifold onto a flattened manifold with less curvature. We then obtain the noise-free constitutive responses by projecting data onto a denoised latent space that is completely flat by assuming that the noise and the underlying constitutive signal are orthogonal to each other. Consequently, a projection from the conservative manifold onto this de-noised constitutive latent space enables us to complete the local optimization step of the data-driven paradigm. Finally, to decode the data expressed in the latent space without reintroducing noise, we impose a set of isometry constraints while training the autoencoder such that the nonlinear mapping from the latent space to the reconstructed constituent manifold is distance-preserving. Numerical examples are used to both validate the implementation and demonstrate the accuracy, robustness, and limitations of the proposed paradigm. 
    more » « less
  3. Geometric Sensitive Hashing functions, a family of Local Sensitive Hashing functions, are neural network models that learn class-specific manifold geometry in supervised learning. However, given a set of supervised learning tasks, understanding the manifold geometries that can represent each task and the kinds of relationships between the tasks based on them has received little attention. We explore a formalization of this question by considering a generative process where each task is associated with a high-dimensional manifold, which can be done in brain-like models with neuromodulatory systems. Following this formulation, we define Task-specific Geometric Sensitive Hashing and show that a randomly weighted neural network with a neuromodulation system can realize this function. 
    more » « less
  4. Lu, Jianfeng; Ward, Rachel (Ed.)
    The Euclidean scattering transform was introduced nearly a decade ago to improve the mathematical understanding of convolutional neural networks. Inspired by recent interest in geometric deep learning, which aims to generalize convolutional neural networks to manifold and graph-structured domains, we define a geometric scattering transform on manifolds. Similar to the Euclidean scattering transform, the geometric scattering transform is based on a cascade of wavelet filters and pointwise nonlinearities. It is invariant to local isometries and stable to certain types of diffeomorphisms. Empirical results demonstrate its utility on several geometric learning tasks. Our results generalize the deformation stability and local translation invariance of Euclidean scattering, and demonstrate the importance of linking the used filter structures to the underlying geometry of the data. 
    more » « less
  5. We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold. 
    more » « less