skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Manifold Coordinates with Physical Meaning
Manifold embedding algorithms map high-dimensional data down to coordinates in a much lower-dimensional space. One of the aims of dimension reduction is to find intrinsic coordinates that describe the data manifold. The coordinates returned by the embedding algorithm are abstract, and finding their physical or domain-related meaning is not formalized and often left to domain experts. This paper studies the problem of recovering the meaning of the new low-dimensional representation in an automatic, principled fashion. We propose a method to explain embedding coordinates of a manifold as non-linear compositions of functions from a user-defined dictionary. We show that this problem can be set up as a sparse linear Group Lasso recovery problem, find sufficient recovery conditions, and demonstrate its effectiveness on data  more » « less
Award ID(s):
2015272 1810975
PAR ID:
10347217
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
23
Issue:
133
ISSN:
1533-7928
Page Range / eLocation ID:
1-57
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Complex models in physics, biology, economics, and engineering are often sloppy , meaning that the model parameters are not well determined by the model predictions for collective behavior. Many parameter combinations can vary over decades without significant changes in the predictions. This review uses information geometry to explore sloppiness and its deep relation to emergent theories. We introduce the model manifold of predictions, whose coordinates are the model parameters. Its hyperribbon structure explains why only a few parameter combinations matter for the behavior. We review recent rigorous results that connect the hierarchy of hyperribbon widths to approximation theory, and to the smoothness of model predictions under changes of the control variables. We discuss recent geodesic methods to find simpler models on nearby boundaries of the model manifold—emergent theories with fewer parameters that explain the behavior equally well. We discuss a Bayesian prior which optimizes the mutual information between model parameters and experimental data, naturally favoring points on the emergent boundary theories and thus simpler models. We introduce a ‘projected maximum likelihood’ prior that efficiently approximates this optimal prior, and contrast both to the poor behavior of the traditional Jeffreys prior. We discuss the way the renormalization group coarse-graining in statistical mechanics introduces a flow of the model manifold, and connect stiff and sloppy directions along the model manifold with relevant and irrelevant eigendirections of the renormalization group. Finally, we discuss recently developed ‘intensive’ embedding methods, allowing one to visualize the predictions of arbitrary probabilistic models as low-dimensional projections of an isometric embedding, and illustrate our method by generating the model manifold of the Ising model. 
    more » « less
  2. We consider the problem of embedding point cloud data sampled from an underlying manifold with an associated flow or velocity. Such data arises in many contexts where static snapshots of dynamic entities are measured, including in high-throughput biology such as single-cell transcriptomics. Existing embedding techniques either do not utilize velocity information or embed the coordinates and velocities independently, i.e., they either impose velocities on top of an existing point embedding or embed points within a prescribed vector field. Here we present FlowArtist, a neural network that embeds points while jointly learning a vector field around the points. The combination allows FlowArtist to better separate and visualize velocity-informed structures. Our results, on toy datasets and single-cell RNA velocity data, illustrate the value of utilizing coordinate and velocity information in tandem for embedding and visualizing high-dimensional data. 
    more » « less
  3. null (Ed.)
    We consider the regression problem of estimating functions on $$ \mathbb{R}^D $$ but supported on a $ d $-dimensional manifold $$ \mathcal{M} ~~\subset \mathbb{R}^D $$ with $$ d \ll D $$. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $$ \mathcal{M} $$ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $ d $, instead of an unknown manifold embedded in $$ \mathbb{R}^D $$. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $ D $ and exponential in $ d $. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees. 
    more » « less
  4. Many design problems involve reasoning about points in high-dimensional space. A common strategy is to first embed these high-dimensional points into a low-dimensional latent space. We propose that a good embedding should be isometric—i.e., preserving the geodesic distance between points on the data manifold in the latent space. However, enforcing isometry is non-trivial for common neural embedding models such as autoencoders. Moreover, while theoretically appealing, it is unclear to what extent is enforcing isometry necessary for a given design analysis. This paper answers these questions by constructing an isometric embedding via an isometric autoencoder, which we employ to analyze an inverse airfoil design problem. Specifically, the paper describes how to train an isometric autoencoder and demonstrates its usefulness compared to non-isometric autoencoders on the UIUC airfoil dataset. Our ablation study illustrates that enforcing isometry is necessary for accurately discovering clusters through the latent space. We also show how isometric autoencoders can uncover pathologies in typical gradient-based shape optimization solvers through an analysis on the SU2-optimized airfoil dataset, wherein we find an over-reliance of the gradient solver on the angle of attack. Overall, this paper motivates the use of isometry constraints in neural embedding models, particularly in cases where researchers or designers intend to use distance-based analysis measures to analyze designs within the latent space. While this work focuses on airfoil design as an illustrative example, it applies to any domain where analyzing isometric design or data embeddings would be useful. 
    more » « less
  5. This article introduces an advanced Koopman mode decomposition (KMD) technique—coined Featurized Koopman Mode Decomposition (FKMD)—that uses delay embedding and a learned Mahalanobis distance to enhance analysis and prediction of high-dimensional dynamical systems. The delay embedding expands the observation space to better capture underlying manifold structures, while the Mahalanobis distance adjusts observations based on the system’s dynamics. This aids in featurizing KMD in cases where good features are not a priori known. We show that FKMD improves predictions for a high-dimensional linear oscillator, a high-dimensional Lorenz attractor that is partially observed, and a cell signaling problem from cancer research. 
    more » « less