Accurate and cost-effective quantification of the carbon cycle for agroecosystems at decision-relevant scales is critical to mitigating climate change and ensuring sustainable food production. However, conventional process-based or data-driven modeling approaches alone have large prediction uncertainties due to the complex biogeochemical processes to model and the lack of observations to constrain many key state and flux variables. Here we propose a Knowledge-Guided Machine Learning (KGML) framework that addresses the above challenges by integrating knowledge embedded in a process-based model, high-resolution remote sensing observations, and machine learning (ML) techniques. Using the U.S. Corn Belt as a testbed, we demonstrate that KGML can outperform conventional process-based and black-box ML models in quantifying carbon cycle dynamics. Our high-resolution approach quantitatively reveals 86% more spatial detail of soil organic carbon changes than conventional coarse-resolution approaches. Moreover, we outline a protocol for improving KGML via various paths, which can be generalized to develop hybrid models to better predict complex earth system dynamics.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Free, publicly-accessible full text available December 1, 2025 -
Abstract Quantifying the temperature sensitivity of methane (CH4) production is crucial for predicting how wetland ecosystems will respond to climate warming. Typically, the temperature sensitivity (often quantified as a Q10value) is derived from laboratory incubation studies and then used in biogeochemical models. However, studies report wide variation in incubation-inferred Q10values, with a large portion of this variation remaining unexplained. Here we applied observations in a thawing permafrost peatland (Stordalen Mire) and a well-tested process-rich model (
ecosys ) to interpret incubation observations and investigate controls on inferred CH4production temperature sensitivity. We developed a field-storage-incubation modeling approach to mimic the full incubation sequence, including field sampling at a particular time in the growing season, refrigerated storage, and laboratory incubation, followed by model evaluation. We found that CH4production rates during incubation are regulated by substrate availability and active microbial biomass of key microbial functional groups, which are affected by soil storage duration and temperature. Seasonal variation in substrate availability and active microbial biomass of key microbial functional groups led to strong time-of-sampling impacts on CH4production. CH4production is higher with less perturbation post-sampling, i.e. shorter storage duration and lower storage temperature. We found a wide range of inferred Q10values (1.2–3.5), which we attribute to incubation temperatures, incubation duration, storage duration, and sampling time. We also show that Q10values of CH4production are controlled by interacting biological, biochemical, and physical processes, which cause the inferred Q10values to differ substantially from those of the component processes. Terrestrial ecosystem models that use a constant Q10value to represent temperature responses may therefore predict biased soil carbon cycling under future climate scenarios. -
Abstract Cover crops have long been seen as an effective management practice to increase soil organic carbon (SOC) and reduce nitrogen (N) leaching. However, there are large uncertainties in quantifying these ecosystem services using either observation (e.g. field measurement, remote sensing data) or process-based modeling. In this study, we developed and implemented a model–data fusion (MDF) framework to improve the quantification of cover crop benefits in SOC accrual and N retention in central Illinois by integrating process-based modeling and remotely-sensed observations. Specifically, we first constrained and validated the process-based agroecosystem model,
ecosys , using observations of cover crop aboveground biomass derived from satellite-based spectral signals, which is highly consistent with field measurements. Then, we compared the simulated cover crop benefits in SOC accrual and N leaching reduction with and without the constraints of remotely-sensed cover crop aboveground biomass. When benchmarked with remote sensing-based observations, the constrained simulations all show significant improvements in quantifying cover crop aboveground biomass C compared with the unconstrained ones, withR 2increasing from 0.60 to 0.87, and root mean square error (RMSE) and absolute bias decreasing by 64% and 97%, respectively. On all study sites, the constrained simulations of aboveground biomass C and N at termination are 29% and 35% lower than the unconstrained ones on average. Correspondingly, the averages of simulated SOC accrual and N retention net benefits are 31% and 23% lower than the unconstrained simulations, respectively. Our results show that the MDF framework with remotely-sensed biomass constraints effectively reduced the uncertainties in cover crop biomass simulations, which further constrained the quantification of cover crop-induced ecosystem services in increasing SOC and reducing N leaching.