NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Knowledge-guided machine learning can improve carbon cycle quantification in agroecosystems

https://doi.org/10.1038/s41467-023-43860-5

Liu, Licheng; Zhou, Wang; Guan, Kaiyu; Peng, Bin; Xu, Shaoming; Tang, Jinyun; Zhu, Qing; Till, Jessica; Jia, Xiaowei; Jiang, Chongya; et al (December 2024, Nature Communications)

Abstract Accurate and cost-effective quantification of the carbon cycle for agroecosystems at decision-relevant scales is critical to mitigating climate change and ensuring sustainable food production. However, conventional process-based or data-driven modeling approaches alone have large prediction uncertainties due to the complex biogeochemical processes to model and the lack of observations to constrain many key state and flux variables. Here we propose a Knowledge-Guided Machine Learning (KGML) framework that addresses the above challenges by integrating knowledge embedded in a process-based model, high-resolution remote sensing observations, and machine learning (ML) techniques. Using the U.S. Corn Belt as a testbed, we demonstrate that KGML can outperform conventional process-based and black-box ML models in quantifying carbon cycle dynamics. Our high-resolution approach quantitatively reveals 86% more spatial detail of soil organic carbon changes than conventional coarse-resolution approaches. Moreover, we outline a protocol for improving KGML via various paths, which can be generalized to develop hybrid models to better predict complex earth system dynamics.
more » « less
Free, publicly-accessible full text available December 1, 2025
Time series predictions in unmonitored sites: a survey of machine learning techniques in water resources

https://doi.org/10.1017/eds.2024.14

Willard, Jared D; Varadharajan, Charuleka; Jia, Xiaowei; Kumar, Vipin (January 2025, Environmental Data Science)

Abstract Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world’s freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics and process knowledge into classical, deep learning, and transfer learning methodologies. The analysis here suggests most prior efforts have been focused on deep learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.
more » « less
Free, publicly-accessible full text available January 1, 2026
Physics-Guided Foundation Model for Scientific Discovery: An Application to Aquatic Science

https://doi.org/10.1609/aaai.v39i27.35078

Yu, Runlong; Qiu, Chonghao; Ladwig, Robert; Hanson, Paul; Xie, Yiqun; Jia, Xiaowei (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Physics-guided machine learning (PGML) has become a prevalent approach in studying scientific systems due to its ability to integrate scientific theories for enhancing machine learning (ML) models. However, most PGML approaches are tailored to isolated and relatively simple tasks, which limits their applicability to complex systems involving multiple interacting processes and numerous influencing features. In this paper, we propose a Physics-Guided Foundation Model (PGFM) that combines pre-trained ML models and physics-based models and leverages their complementary strengths to improve the modeling of multiple coupled processes. To effectively conduct pre-training, we construct a simulated environmental system that encompasses a wide range of influencing features and various simulated variables generated by physics-based models. The model is pre-trained in this system to adaptively select important feature interactions guided by multi-task objectives. We then fine-tune the model for each specific task using true observations, while maintaining consistency with established physical theories, such as the principles of mass and energy conservation. We demonstrate the effectiveness of this methodology in modeling water temperature and dissolved oxygen dynamics in real-world lakes. The proposed PGFM is also broadly applicable to a range of scientific fields where physics-based models are being used.
more » « less
Free, publicly-accessible full text available April 11, 2026
Multi-Scale Graph Learning for Anti-Sparse Downscaling

https://doi.org/10.1609/aaai.v39i27.35014

Fan, Yingda; Yu, Runlong; Barclay, Janet R; Appling, Alison P; Sun, Yiming; Xie, Yiqun; Jia, Xiaowei (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Water temperature can vary substantially even across short distances within the same sub-watershed. Accurate prediction of stream water temperature at fine spatial resolutions (i.e., fine scales, ≤ 1 km) enables precise interventions to maintain water quality and protect aquatic habitats. Although spatiotemporal models have made substantial progress in spatially coarse time series modeling, challenges persist in predicting at fine spatial scales due to the lack of data at that scale. To address the problem of insufficient fine-scale data, we propose a Multi-Scale Graph Learning (MSGL) method. This method employs a multi-task learning framework where coarse-scale graph learning, bolstered by larger datasets, simultaneously enhances fine-scale graph learning. Although existing multi-scale or multi-resolution methods integrate data from different spatial scales, they often overlook the spatial correspondences across graph structures at various scales. To address this, our MSGL introduces an additional learning task, cross-scale interpolation learning, which leverages the hydrological connectedness of stream locations across coarse- and fine-scale graphs to establish cross-scale connections, thereby enhancing overall model performance. Furthermore, we have broken free from the mindset that multi-scale learning is limited to synchronous training by proposing an Asynchronous Multi-Scale Graph Learning method (ASYNC-MSGL). Extensive experiments demonstrate the state-of-the-art performance of our method for anti-sparse downscaling of daily stream temperatures in the Delaware River Basin, USA, highlighting its potential utility for water resources monitoring and management.
more » « less
Free, publicly-accessible full text available April 11, 2026
Physics-Guided Fair Graph Sampling for Water Temperature Prediction in River Networks

https://doi.org/10.1609/aaai.v39i27.35025

He, Erhu; Kutscher, Declan; Xie, Yiqun; Zwart, Jacob; Jiang, Zhe; Yao, Huaxiu; Jia, Xiaowei (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

This work introduces a novel graph neural networks (GNNs)-based method to predict stream water temperature and reduce model bias across locations of different income and education levels. Traditional physics-based models often have limited accuracy because they are necessarily approximations of reality. Recently, there has been an increasing interest of using GNNs in modeling complex water dynamics in stream networks. Despite their promise in improving the accuracy, GNNs can bring additional model bias through the aggregation process, where node features are updated by aggregating neighboring nodes. The bias can be especially pronounced when nodes with similar sensitive attributes are frequently connected. We introduce a new method that leverages physical knowledge to represent the node influence in GNNs, and then utilizes physics-based influence to refine the selection and weights over the neighbors. The objective is to facilitate equitable treatment over different sensitive groups in the graph aggregation, which helps reduce spatial bias over locations, especially for those in underprivileged groups. The results on the Delaware River Basin demonstrate the effectiveness of the proposed method in preserving equitable performance across locations in different sensitive groups.
more » « less
Free, publicly-accessible full text available April 11, 2026
Towards the next generation of Geospatial Artificial Intelligence

https://doi.org/10.1016/j.jag.2025.104368

Mai, Gengchen; Xie, Yiqun; Jia, Xiaowei; Lao, Ni; Rao, Jinmeng; Zhu, Qing; Liu, Zeping; Chiang, Yao-Yi; Jiao, Junfeng (February 2025, International Journal of Applied Earth Observation and Geoinformation)

Free, publicly-accessible full text available February 1, 2026
Domain-Adaptive Continual Meta-Learning for Modeling Dynamical Systems: An Application in Environmental Ecosystems

https://doi.org/10.1137/1.9781611978520.29

Sun, Yiming; Yu, Runlong; Bao, Runxue; Xie, Yiqun; Ye, Ye; Jia, Xiaowei (January 2025, Society for Industrial and Applied Mathematics)

Free, publicly-accessible full text available January 1, 2026
SolarCube: An Integrative Benchmark Dataset Harnessing Satellite and In-situ Observations for Large-scale Solar Energy Forecasting

Li, Ruohan; Xie, Yiqun; Jia, Xiaowei; Wang, Dongdong; Li, Yanhua; Zhang, Yingxue; Wang, Zhihao; Li, Zhili (December 2024, NeurIPS)

Free, publicly-accessible full text available December 16, 2025
Adaptive Process-Guided Learning: An Application in Predicting Lake DO Concentrations

https://doi.org/10.1109/ICDM59182.2024.00065

Yu, Runlong; Qiu, Chonghao; Ladwig, Robert; Hanson, Paul C; Xie, Yiqun; Li, Yanhua; Jia, Xiaowei (December 2024, IEEE)

Free, publicly-accessible full text available December 9, 2025
Knowledge Guided Machine Learning for Extracting, Preserving, and Adapting Physics-aware Features

https://doi.org/10.1137/1.9781611978032.82

He, Erhu; Xie, Yiqun; Liu, Licheng; Jin, Zhenong; Zhang, Dajun; Jia, Xiaowei (April 2024, SIAM International Conference on Data Mining (SDM) 2024)

Training machine learning (ML) models for scientific problems is often challenging due to limited observation data. To overcome this challenge, prior works commonly pre-train ML models using simulated data before having them fine-tuned with small real data. Despite the promise shown in initial research across different domains, these methods cannot ensure improved performance after fine-tuning because (i) they are not designed for extracting generalizable physics-aware features during pre-training, (ii) the features learned from pre-training can be distorted by the fine-tuning process. In this paper, we propose a new learning method for extracting, preserving, and adapting physics-aware features. We build a knowledge-guided neural network (KGNN) model based on known dependencies amongst physical variables, which facilitate extracting physics-aware feature representation from simulated data. Then we fine-tune this model by alternately updating the encoder and decoder of the KGNN model to enhance the prediction while preserving the physics-aware features learned through pre-training. We further propose to adapt the model to new testing scenarios via a teacher-student learning framework based on the model uncertainty. The results demonstrate that the proposed method outperforms many baselines by a good margin, even using sparse training data or under out-of-sample testing scenarios.
more » « less

« Prev Next »

Search for: All records