Abstract In reduced-order modeling, complex systems that exhibit high state-space dimensionality are described and evolved using a small number of parameters. These parameters can be obtained in a data-driven way, where a high-dimensional dataset is projected onto a lower-dimensional basis. A complex system is then restricted to states on a low-dimensional manifold where it can be efficiently modeled. While this approach brings computational benefits, obtaining a good quality of the manifold topology becomes a crucial aspect when models, such as nonlinear regression, are built on top of the manifold. Here, we present a quantitative metric for characterizing manifold topologies. Our metric pays attention to non-uniqueness and spatial gradients in physical quantities of interest, and can be applied to manifolds of arbitrary dimensionality. Using the metric as a cost function in optimization algorithms, we show that optimized low-dimensional projections can be found. We delineate a few applications of the cost function to datasets representing argon plasma, reacting flows and atmospheric pollutant dispersion. We demonstrate how the cost function can assess various dimensionality reduction and manifold learning techniques as well as data preprocessing strategies in their capacity to yield quality low-dimensional projections. We show that improved manifold topologies can facilitate building nonlinear regression models. 
                        more » 
                        « less   
                    This content will become publicly available on December 1, 2025
                            
                            Predicting multiple observations in complex systems through low-dimensional embeddings
                        
                    
    
            Abstract Forecasting all components in complex systems is an open and challenging task, possibly due to high dimensionality and undesirable predictors. We bridge this gap by proposing a data-driven and model-free framework, namely, feature-and-reconstructed manifold mapping (FRMM), which is a combination of feature embedding and delay embedding. For a high-dimensional dynamical system, FRMM finds its topologically equivalent manifolds with low dimensions from feature embedding and delay embedding and then sets the low-dimensional feature manifold as a generalized predictor to achieve predictions of all components. The substantial potential of FRMM is shown for both representative models and real-world data involving Indian monsoon, electroencephalogram (EEG) signals, foreign exchange market, and traffic speed in Los Angeles Country. FRMM overcomes the curse of dimensionality and finds a generalized predictor, and thus has potential for applications in many other real-world systems. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2047488
- PAR ID:
- 10500542
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 15
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Most applications of multispectral imaging are explicitly or implicitly dependent on the dimensionality and topology of the spectral mixing space. Mixing space characterization refers to the identification of salient properties of the set of pixel reflectance spectra comprising an image (or compilation of images). The underlying premise is that this set of spectra may be described as a low dimensional manifold embedded in a high dimensional vector space. Traditional mixing space characterization uses the linear dimensionality reduction offered by Principal Component Analysis to find projections of pixel spectra onto orthogonal linear subspaces, prioritized by variance. Here, we consider the potential for recent advances in nonlinear dimensionality reduction (specifically, manifold learning) to contribute additional useful information for multispectral mixing space characterization. We integrate linear and nonlinear methods through a novel approach called Joint Characterization (JC). JC is comprised of two components. First, spectral mixture analysis (SMA) linearly projects the high-dimensional reflectance vectors onto a 2D subspace comprising the primary mixing continuum of substrates, vegetation, and dark features (e.g., shadow and water). Second, manifold learning nonlinearly maps the high-dimensional reflectance vectors into a low-D embedding space while preserving manifold topology. The SMA output is physically interpretable in terms of material abundances. The manifold learning output is not generally physically interpretable, but more faithfully preserves high dimensional connectivity and clustering within the mixing space. Used together, the strengths of SMA may compensate for the limitations of manifold learning, and vice versa. Here, we illustrate JC through application to thematic compilations of 90 Sentinel-2 reflectance images selected from a diverse set of biomes and land cover categories. Specifically, we use globally standardized Substrate, Vegetation, and Dark (S, V, D) endmembers (EMs) for SMA, and Uniform Manifold Approximation and Projection (UMAP) for manifold learning. The value of each (SVD and UMAP) model is illustrated, both separately and jointly. JC is shown to successfully characterize both continuous gradations (spectral mixing trends) and discrete clusters (land cover class distinctions) within the spectral mixing space of each land cover category. These features are not clearly identifiable from SVD fractions alone, and not physically interpretable from UMAP alone. Implications are discussed for the design of models which can reliably extract and explainably use high-dimensional spectral information in spatially mixed pixels—a principal challenge in optical remote sensing.more » « less
- 
            This article introduces an advanced Koopman mode decomposition (KMD) technique—coined Featurized Koopman Mode Decomposition (FKMD)—that uses delay embedding and a learned Mahalanobis distance to enhance analysis and prediction of high-dimensional dynamical systems. The delay embedding expands the observation space to better capture underlying manifold structures, while the Mahalanobis distance adjusts observations based on the system’s dynamics. This aids in featurizing KMD in cases where good features are not a priori known. We show that FKMD improves predictions for a high-dimensional linear oscillator, a high-dimensional Lorenz attractor that is partially observed, and a cell signaling problem from cancer research.more » « less
- 
            Covariate shift is a prevalent setting for supervised learning in the wild when the training and test data are drawn from different time periods, different but related domains, or via different sampling strategies. This paper addresses a transfer learning setting, with covariate shift between source and target domains. Most existing methods for correcting covariate shift exploit density ratios of the features to reweight the source-domain data, and when the features are high-dimensional, the estimated density ratios may suffer large estimation variances, leading to poor performance of prediction under covariate shift. In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target Y, and exploits the density ratio of this representation for importance reweighting. We discuss the factors that affect the performance of our method, and demonstrate its capabilities on both pseudo-real data and real-world applications.more » « less
- 
            This article introduces an isometric manifold embedding data-driven paradigm designed to enable model-free simulations with noisy data sampled from a constitutive manifold. The proposed data-driven approach iterates between a global optimization problem that seeks admissible solutions for the balance principle and a local optimization problem that finds the closest point projection of the Euclidean space that isometrically embeds a nonlinear constitutive manifold. To de-noise the database, a geometric autoencoder is introduced such that the encoder first learns to create an approximated embedding that maps the underlying low-dimensional structure of the high-dimensional constitutive manifold onto a flattened manifold with less curvature. We then obtain the noise-free constitutive responses by projecting data onto a denoised latent space that is completely flat by assuming that the noise and the underlying constitutive signal are orthogonal to each other. Consequently, a projection from the conservative manifold onto this de-noised constitutive latent space enables us to complete the local optimization step of the data-driven paradigm. Finally, to decode the data expressed in the latent space without reintroducing noise, we impose a set of isometry constraints while training the autoencoder such that the nonlinear mapping from the latent space to the reconstructed constituent manifold is distance-preserving. Numerical examples are used to both validate the implementation and demonstrate the accuracy, robustness, and limitations of the proposed paradigm.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
