Accurate and cost-effective quantification of the carbon cycle for agroecosystems at decision-relevant scales is critical to mitigating climate change and ensuring sustainable food production. However, conventional process-based or data-driven modeling approaches alone have large prediction uncertainties due to the complex biogeochemical processes to model and the lack of observations to constrain many key state and flux variables. Here we propose a Knowledge-Guided Machine Learning (KGML) framework that addresses the above challenges by integrating knowledge embedded in a process-based model, high-resolution remote sensing observations, and machine learning (ML) techniques. Using the U.S. Corn Belt as a testbed, we demonstrate that KGML can outperform conventional process-based and black-box ML models in quantifying carbon cycle dynamics. Our high-resolution approach quantitatively reveals 86% more spatial detail of soil organic carbon changes than conventional coarse-resolution approaches. Moreover, we outline a protocol for improving KGML via various paths, which can be generalized to develop hybrid models to better predict complex earth system dynamics.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Free, publicly-accessible full text available December 1, 2025 -
Free, publicly-accessible full text available December 1, 2024
-
Abstract Cover crops have long been seen as an effective management practice to increase soil organic carbon (SOC) and reduce nitrogen (N) leaching. However, there are large uncertainties in quantifying these ecosystem services using either observation (e.g. field measurement, remote sensing data) or process-based modeling. In this study, we developed and implemented a model–data fusion (MDF) framework to improve the quantification of cover crop benefits in SOC accrual and N retention in central Illinois by integrating process-based modeling and remotely-sensed observations. Specifically, we first constrained and validated the process-based agroecosystem model,
ecosys , using observations of cover crop aboveground biomass derived from satellite-based spectral signals, which is highly consistent with field measurements. Then, we compared the simulated cover crop benefits in SOC accrual and N leaching reduction with and without the constraints of remotely-sensed cover crop aboveground biomass. When benchmarked with remote sensing-based observations, the constrained simulations all show significant improvements in quantifying cover crop aboveground biomass C compared with the unconstrained ones, withR 2increasing from 0.60 to 0.87, and root mean square error (RMSE) and absolute bias decreasing by 64% and 97%, respectively. On all study sites, the constrained simulations of aboveground biomass C and N at termination are 29% and 35% lower than the unconstrained ones on average. Correspondingly, the averages of simulated SOC accrual and N retention net benefits are 31% and 23% lower than the unconstrained simulations, respectively. Our results show that the MDF framework with remotely-sensed biomass constraints effectively reduced the uncertainties in cover crop biomass simulations, which further constrained the quantification of cover crop-induced ecosystem services in increasing SOC and reducing N leaching.