skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Enhancing Streamflow Prediction in Ungauged Basins Using a Nonlinear Knowledge‐Based Framework and Deep Learning
Abstract In hydrology, a fundamental task involves enhancing the predictive power of a model in ungagged basins by transferring information on physical attributes and hydroclimate dynamics from gauged basins. Introducing an integrated nonlinear clustering framework, this study aims to develop a comprehensive framework that augments predictive performance in basins where direct measurements are sparse or absent. In this framework, uniform manifold approximation and projection (UMAP) is used as a nonlinear method to extract the essential features embedded in hydro‐climatological attributes and physical properties. Then, the Growing Neural Gas (GNG) clustering model is used to find the basins that potentially share similar hydro‐climatological behaviors. Besides UMAP‐GNG, the integration of Principal Component Analysis (PCA) as a linear method to reduce dimensionality with common clustering methods are also assessed to serve as benchmarks. The results reveal that the combination of clustering algorithms with the PCA method may lead to loss of information while the nonlinear method (UMAP) can extract more informative features. The efficacy of the proposed framework is assessed across the Contiguous United States (CONUS) by training a single Base Model using long short‐term memory (LSTM) for the centroids of all clusters and then, fine‐tuning the model on the centroids of each cluster separately to create a regional model. The results indicate that using the information extracted by the UMAP‐GNG method to guide a Base Model can significantly improve the accuracy in most of the clusters and enhance the median prediction accuracy within different clusters from 0.04 to 0.37 of KGE in ungauged basins.  more » « less
Award ID(s):
1856054
PAR ID:
10576700
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Water Resources Research
Volume:
60
Issue:
11
ISSN:
0043-1397
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationThe rapid development of scRNA-seq technologies enables us to explore the transcriptome at the cell level on a large scale. Recently, various computational methods have been developed to analyze the scRNAseq data, such as clustering and visualization. However, current visualization methods, including t-SNE and UMAP, are challenged by the limited accuracy of rendering the geometric relationship of populations with distinct functional states. Most visualization methods are unsupervised, leaving out information from the clustering results or given labels. This leads to the inaccurate depiction of the distances between the bona fide functional states. In particular, UMAP and t-SNE are not optimal to preserve the global geometric structure. They may result in a contradiction that clusters with near distance in the embedded dimensions are in fact further away in the original dimensions. Besides, UMAP and t-SNE cannot track the variance of clusters. Through the embedding of t-SNE and UMAP, the variance of a cluster is not only associated with the true variance but also is proportional to the sample size. ResultsWe present supCPM, a robust supervised visualization method, which separates different clusters, preserves the global structure and tracks the cluster variance. Compared with six visualization methods using synthetic and real datasets, supCPM shows improved performance than other methods in preserving the global geometric structure and data variance. Overall, supCPM provides an enhanced visualization pipeline to assist the interpretation of functional transition and accurately depict population segregation. Availability and implementationThe R package and source code are available at https://zenodo.org/record/5975977#.YgqR1PXMJjM. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance. Our method creates a strong defense against inference attacks, while only suffering small loss in task performance. Theoretically, we analyze the effectiveness of our framework against a worst-case adversary, and characterize an inherent trade-off between maximizing predictive accuracy and minimizing information leakage. Experiments across multiple datasets from recommender systems, knowledge graphs and quantum chemistry demonstrate that the proposed approach provides a robust defense across various graph structures and tasks, while producing competitive GNN encoders for downstream tasks. 
    more » « less
  3. We propose a new tool for visualizing complex, and potentially large and high-dimensional, data sets called Centroid-Encoder (CE). The architecture of the Centroid-Encoder is similar to the autoencoder neural network but it has a modified target, i.e., the class centroid in the ambient space. As such, CE incorporates label information and performs a supervised data visualization. The training of CE is done in the usual way with a training set whose parameters are tuned using a validation set. The evaluation of the resulting CE visualization is performed on a sequestered test set where the generalization of the model is assessed both visually and quantitatively. We present a detailed comparative analysis of the method using a wide variety of data sets and techniques, both supervised and unsupervised, including NCA, non-linear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. An analysis of variance using PCA demonstrates that a non-linear preprocessing by the CE transformation of the data captures more variance than PCA by dimension. 
    more » « less
  4. Most applications of multispectral imaging are explicitly or implicitly dependent on the dimensionality and topology of the spectral mixing space. Mixing space characterization refers to the identification of salient properties of the set of pixel reflectance spectra comprising an image (or compilation of images). The underlying premise is that this set of spectra may be described as a low dimensional manifold embedded in a high dimensional vector space. Traditional mixing space characterization uses the linear dimensionality reduction offered by Principal Component Analysis to find projections of pixel spectra onto orthogonal linear subspaces, prioritized by variance. Here, we consider the potential for recent advances in nonlinear dimensionality reduction (specifically, manifold learning) to contribute additional useful information for multispectral mixing space characterization. We integrate linear and nonlinear methods through a novel approach called Joint Characterization (JC). JC is comprised of two components. First, spectral mixture analysis (SMA) linearly projects the high-dimensional reflectance vectors onto a 2D subspace comprising the primary mixing continuum of substrates, vegetation, and dark features (e.g., shadow and water). Second, manifold learning nonlinearly maps the high-dimensional reflectance vectors into a low-D embedding space while preserving manifold topology. The SMA output is physically interpretable in terms of material abundances. The manifold learning output is not generally physically interpretable, but more faithfully preserves high dimensional connectivity and clustering within the mixing space. Used together, the strengths of SMA may compensate for the limitations of manifold learning, and vice versa. Here, we illustrate JC through application to thematic compilations of 90 Sentinel-2 reflectance images selected from a diverse set of biomes and land cover categories. Specifically, we use globally standardized Substrate, Vegetation, and Dark (S, V, D) endmembers (EMs) for SMA, and Uniform Manifold Approximation and Projection (UMAP) for manifold learning. The value of each (SVD and UMAP) model is illustrated, both separately and jointly. JC is shown to successfully characterize both continuous gradations (spectral mixing trends) and discrete clusters (land cover class distinctions) within the spectral mixing space of each land cover category. These features are not clearly identifiable from SVD fractions alone, and not physically interpretable from UMAP alone. Implications are discussed for the design of models which can reliably extract and explainably use high-dimensional spectral information in spatially mixed pixels—a principal challenge in optical remote sensing. 
    more » « less
  5. NASA’s Earth Surface Mineral Dust Source Investigation (EMIT) mission seeks to use spaceborne imaging spectroscopy (hyperspectral imaging) to map the mineralogy of arid dust source regions. Here we apply recent developments in Joint Characterization (JC) and the spectral Mixture Residual (MR) to explore the information content of data from this novel mission. Specifically, for a mosaic of 20 spectrally diverse scenes, we find: (1) a generalized three-endmember (Substrate, Vegetation, Dark; SVD) spectral mixture model is capable of capturing the preponderance (99% in three dimensions) of spectral variance with low misfit (99% pixels with <3.7% RMSE); (2) manifold learning (UMAP) is capable of identifying spatially coherent, physically interpretable clustering relationships in the spectral feature space; (3) UMAP yields results that are at least as informative when applied to the MR as when applied to raw reflectance; (4) SVD fraction information usefully contextualizes UMAP clustering relationships, and vice-versa (JC); and (5) when EMIT data are convolved to spectral response functions of multispectral instruments (Sentinel-2, Landsat 8/9, Planet SuperDove), SVD fractions correlate strongly across sensors, but UMAP clustering relationships for the EMIT hyperspectral feature space are far more informative than for simulated multispectral sensors. Implications are discussed for both the utility of EMIT data in the near-term and for the potential of high signal-to-noise (SNR) spaceborne imaging spectroscopy more generally, to transform the future of optical remote sensing in the years and decades to come. 
    more » « less