skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 2239175

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Accurate and cost-effective quantification of the carbon cycle for agroecosystems at decision-relevant scales is critical to mitigating climate change and ensuring sustainable food production. However, conventional process-based or data-driven modeling approaches alone have large prediction uncertainties due to the complex biogeochemical processes to model and the lack of observations to constrain many key state and flux variables. Here we propose a Knowledge-Guided Machine Learning (KGML) framework that addresses the above challenges by integrating knowledge embedded in a process-based model, high-resolution remote sensing observations, and machine learning (ML) techniques. Using the U.S. Corn Belt as a testbed, we demonstrate that KGML can outperform conventional process-based and black-box ML models in quantifying carbon cycle dynamics. Our high-resolution approach quantitatively reveals 86% more spatial detail of soil organic carbon changes than conventional coarse-resolution approaches. Moreover, we outline a protocol for improving KGML via various paths, which can be generalized to develop hybrid models to better predict complex earth system dynamics.

     
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  2. Training machine learning (ML) models for scientific problems is often challenging due to limited observation data. To overcome this challenge, prior works commonly pre-train ML models using simulated data before having them fine-tuned with small real data. Despite the promise shown in initial research across different domains, these methods cannot ensure improved performance after fine-tuning because (i) they are not designed for extracting generalizable physics-aware features during pre-training, (ii) the features learned from pre-training can be distorted by the fine-tuning process. In this paper, we propose a new learning method for extracting, preserving, and adapting physics-aware features. We build a knowledge-guided neural network (KGNN) model based on known dependencies amongst physical variables, which facilitate extracting physics-aware feature representation from simulated data. Then we fine-tune this model by alternately updating the encoder and decoder of the KGNN model to enhance the prediction while preserving the physics-aware features learned through pre-training. We further propose to adapt the model to new testing scenarios via a teacher-student learning framework based on the model uncertainty. The results demonstrate that the proposed method outperforms many baselines by a good margin, even using sparse training data or under out-of-sample testing scenarios. 
    more » « less
    Free, publicly-accessible full text available April 1, 2025
  3. Accurate prediction of water flow is of utmost importance, particularly for ensuring water supply and informing early actions for floods and droughts. Existing flow prediction methods rely on the input of weather drivers, which hinders their applicability to monitoring small headwater streams due to the limited spatial resolution of existing weather datasets. This paper introduces a new dataset with frequent imagery on streams for water monitoring tasks. Our objective is to automatically predict streamflow for each stream site using frequent images taken at a sub-hourly scale. To overcome the challenge of limited labels for certain stream sites, we employ knowledge transfer from well-observed sites to poorly-observed sites via domain adaptation. As each stream site involves highly variable time series data over long periods, we introduce a novel method STCGAN (Spatial-Temporal Cycle Generative Adversarial Network), which incorporates temporal context by conditioning on the sequence's time and learns overall trends of stream flow variation. It integrates the predictive modeling of streamflow with the cyclic generative process and enhances the prediction with data augmentation using generated synthetic samples. Our experiments demonstrate superior performance of the proposed method using data collected from the West Brook area located in western Massachusetts, US. The proposed method can be further extended to selectively combine information from multiple well-observed stream sites, leading to improved overall performance. 
    more » « less
  4. Training machine learning (ML) models for scientific problems is often challenging due to limited observation data. To overcome this challenge, prior works commonly pre-train ML models using simulated data before having them fine-tuned with small real data. Despite the promise shown in initial research across different domains, these methods cannot ensure improved performance after fine-tuning because (i) they are not designed for extracting generalizable physics-aware features during pre-training, (ii) the features learned from pre-training can be distorted by the fine-tuning process. In this paper, we propose a new learning method for extracting, preserving, and adapting physics-aware features. We build a knowledge-guided neural network (KGNN) model based on known dependencies amongst physical variables, which facilitate extracting physics-aware feature representation from simulated data. Then we fine-tune this model by alternately updating the encoder and decoder of the KGNN model to enhance the prediction while preserving the physics-aware features learned through pre-training. We further propose to adapt the model to new testing scenarios via a teacher-student learning framework based on the model uncertainty. The results demonstrate that the proposed method outperforms many baselines by a good margin, even using sparse training data or under out-of-sample testing scenarios. 
    more » « less
  5. Accurate prediction of water quality and quantity is crucial for sustainable development and human well-being. However, existing data-driven methods often suffer from spatial biases in model performance due to heterogeneous data, limited observations, and noisy sensor data. To overcome these challenges, we propose Fair-Graph, a novel graph-based recurrent neural network that leverages interrelated knowledge from multiple rivers to predict water flow and temperature within large-scale stream networks. Additionally, we introduce node-specific graph masks for information aggregation and adaptation to enhance prediction over heterogeneous river segments. To reduce performance disparities across river segments, we introduce a centralized coordination strategy that adjusts training priorities for segments. We evaluate the prediction of water temperature within the Delaware River Basin, and the prediction of streamflow using simulated data from U.S. National Water Model in the Houston River network. The results showcase improvements in predictive performance and highlight the proposed model's ability to maintain spatial fairness over different river segments.

     
    more » « less
    Free, publicly-accessible full text available March 25, 2025
  6. Fairness-awareness has emerged as an essential building block for the responsible use of artificial intelligence in real applications. In many cases, inequity in performance is due to the change in distribution over different regions. While techniques have been developed to improve the transferability of fairness, a solution to the problem is not always feasible with no samples from the new regions, which is a bottleneck for pure data-driven attempts. Fortunately, physics-based mechanistic models have been studied for many problems with major social impacts. We propose SimFair, a physics-guided fairness-aware learning framework, which bridges the data limitation by integrating physical-rule-based simulation and inverse modeling into the training design. Using temperature prediction as an example, we demonstrate the effectiveness of the proposed SimFair in fairness preservation.

     
    more » « less
    Free, publicly-accessible full text available March 25, 2025
  7. Accurate simulation of turbulent flows is of crucial importance in many branches of science and engineering. Direct numerical simulation (DNS) provides the highest fidelity means of capturing all intricate physics of turbulent transport. However, the method is computationally expensive because of the wide range of turbulence scales that must be accounted for in such simulations. Large eddy simulation (LES) provides an alternative. In such simulations, the large scales of the flow are resolved, and the effects of small scales are modelled. Reconstruction of the DNS field from the low-resolution LES is needed for a wide variety of applications. Thus the construction of super-resolution methodologies that can provide this reconstruction has become an area of active research. In this work, a new physics-guided neural network is developed for such a reconstruction. The method leverages the partial differential equation that underlies the flow dynamics in the design of spatio-temporal model architecture. A degradation-based refinement method is also developed to enforce physical constraints and to further reduce the accumulated reconstruction errors over long periods. Detailed DNS data on two turbulent flow configurations are used to assess the performance of the model.

     
    more » « less
    Free, publicly-accessible full text available February 28, 2025
  8. Accurate climate data at fine spatial resolution are essential for scientific research and the development and planning of crucial social systems, such as energy and agriculture. Among them, sea surface temperature plays a critical role as the associated El Niño–Southern Oscillation (ENSO) is considered a significant signal of the global interannual climate system. In this paper, we propose an implicit neural representation-based interpolation method with temporal information (T_INRI) to reconstruct climate data of high spatial resolution, with sea surface temperature as the research object. Traditional deep learning models for generating high-resolution climate data are only applicable to fixed-resolution enhancement scales. In contrast, the proposed T_INRI method is not limited to the enhancement scale provided during the training process and its results indicate that it can enhance low-resolution input by arbitrary scale. Additionally, we discuss the impact of temporal information on the generation of high-resolution climate data, specifically, the influence of the month from which the low-resolution sea surface temperature data are obtained. Our experimental results indicate that T_INRI is advantageous over traditional interpolation methods under different enhancement scales, and the temporal information can improve T_INRI performance for a different calendar month. We also examined the potential capability of T_INRI in recovering missing grid value. These results demonstrate that the proposed T_INRI is a promising method for generating high-resolution climate data and has significant implications for climate research and related applications.

     
    more » « less