skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: KGML-ag: a modeling framework of knowledge-guided machine learning to simulate agroecosystems: a case study of estimating N<sub>2</sub>O emission using data from mesocosm experiments
Abstract. Agricultural nitrous oxide (N2O) emission accounts for a non-trivialfraction of global greenhouse gas (GHG) budget. To date, estimatingN2O fluxes from cropland remains a challenging task because the relatedmicrobial processes (e.g., nitrification and denitrification) are controlledby complex interactions among climate, soil, plant and human activities.Existing approaches such as process-based (PB) models have well-knownlimitations due to insufficient representations of the processes oruncertainties of model parameters, and due to leverage recent advances inmachine learning (ML) a new method is needed to unlock the “black box” toovercome its limitations such as low interpretability, out-of-sample failureand massive data demand. In this study, we developed a first-of-its-kindknowledge-guided machine learning model for agroecosystems (KGML-ag) byincorporating biogeophysical and chemical domain knowledge from an advanced PBmodel, ecosys, and tested it by comparing simulating daily N2O fluxes withreal observed data from mesocosm experiments. The gated recurrent unit (GRU)was used as the basis to build the model structure. To optimize the modelperformance, we have investigated a range of ideas, including (1) usinginitial values of intermediate variables (IMVs) instead of time series asmodel input to reduce data demand; (2) building hierarchical structures toexplicitly estimate IMVs for further N2O prediction; (3) using multi-tasklearning to balance the simultaneous training on multiple variables; and (4)pre-training with millions of synthetic data generated from ecosys and fine-tuningwith mesocosm observations. Six other pure ML models were developed usingthe same mesocosm data to serve as the benchmark for the KGML-ag model.Results show that KGML-ag did an excellent job in reproducing the mesocosmN2O fluxes (overall r2=0.81, and RMSE=3.6 mgNm-2d-1from cross validation). Importantly, KGML-ag always outperformsthe PB model and ML models in predicting N2O fluxes, especially forcomplex temporal dynamics and emission peaks. Besides, KGML-ag goes beyondthe pure ML models by providing more interpretable predictions as well aspinpointing desired new knowledge and data to further empower the currentKGML-ag. We believe the KGML-ag development in this study will stimulate anew body of research on interpretable ML for biogeochemistry and otherrelated geoscience processes.  more » « less
Award ID(s):
2034385 1922512
PAR ID:
10387735
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
15
Issue:
7
ISSN:
1991-9603
Page Range / eLocation ID:
2839 to 2858
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract. Flux measurements of nitrogen oxides (NOx) were made over London usingairborne eddy covariance from a low-flying aircraft. Seven low-altitude flights were conducted over Greater London, performing multiple overpasses across the city during eight days in July 2014. NOx fluxes across theGreater London region (GLR) exhibited high heterogeneity and strong diurnalvariability, with central areas responsible for the highest emission rates(20–30 mg m−2 h−1). Other high-emission areas included the M25 orbital motorway. The complexity of London's emission characteristics makes it challenging to pinpoint single emissions sources definitively usingairborne measurements. Multiple sources, including road transport andresidential, commercial and industrial combustion sources, are all likely to contribute to measured fluxes. Measured flux estimates were compared toscaled National Atmospheric Emissions Inventory (NAEI) estimates, accountingfor monthly, daily and hourly variability. Significant differences were found between the flux-driven emissions and the NAEI estimates acrossGreater London, with measured values up to 2 times higher in Central London than those predicted by the inventory. To overcome the limitations ofusing the national inventory to contextualise measured fluxes, we usedphysics-guided flux data fusion to train environmental response functions(ERFs) between measured flux and environmental drivers (meteorological and surface). The aim was to generate time-of-day emission surfaces usingcalculated ERF relationships for the entire GLR; 98 % spatial coverage was achieved across the GLR at 400 m2 spatial resolution. All flight legprojections showed substantial heterogeneity across the domain, with highemissions emanating from Central London and major road infrastructure. Thediurnal emission structure of the GLR was also investigated, through ERF,with the morning rush hour distinguished from lower emissions during the early afternoon. Overall, the integration of airborne fluxes with anERF-driven strategy enabled the first independent generation of surfaceNOx emissions, at high resolution using an eddy-covariance approach,for an entire city region. 
    more » « less
  2. Despite the need for researchers to understand terrestrial biospheric carbon fluxes to account for carbon cycle feedbacks and predict future CO2 concentrations, knowledge of these fluxes at the regional scale remains poor. This is particularly true in mountainous areas, where complex meteorology and lack of observations lead to large uncertainties in carbon fluxes. Yet mountainous regions are often where significant forest cover and biomass are found – i.e., areas that have the potential to serve as carbon sinks. As CO2 observations are carried out in mountainous areas, it is imperative that they are properly interpreted to yield information about carbon fluxes. In this paper, we present CO2 observations at three sites in the mountains of the western US, along with atmospheric simulations that attempt to extract information about biospheric carbon fluxes from the CO2 observations, with emphasis on the observed and simulated diurnal cycles of CO2. We show that atmospheric models can systematically simulate the wrong diurnal cycle and significantly misinterpret the CO2 observations, due to erroneous atmospheric flows as a result of terrain that is misrepresented in the model. This problem depends on the selected vertical level in the model and is exacerbated as the spatial resolution is degraded, and our results indicate that a fine grid spacing of ∼ 4 km or less may be needed to simulate a realistic diurnal cycle of CO2 for sites on top of the steep mountains examined here in the American Rockies. In the absence of higher resolution models, we recommend coarse-scale models to focus on assimilating afternoon CO2 observations on mountaintop sites over the continent to avoid misrepresentations of nocturnal transport and influence. 
    more » « less
  3. Abstract. Methane (CH4) emissions from the boreal and arcticregion are globally significant and highly sensitive to climate change.There is currently a wide range in estimates of high-latitude annualCH4 fluxes, where estimates based on land cover inventories andempirical CH4 flux data or process models (bottom-up approaches)generally are greater than atmospheric inversions (top-down approaches). Alimitation of bottom-up approaches has been the lack of harmonizationbetween inventories of site-level CH4 flux data and the land coverclasses present in high-latitude spatial datasets. Here we present acomprehensive dataset of small-scale, surface CH4 flux data from 540terrestrial sites (wetland and non-wetland) and 1247 aquatic sites (lakesand ponds), compiled from 189 studies. The Boreal–Arctic Wetland and LakeMethane Dataset (BAWLD-CH4) was constructed in parallel with acompatible land cover dataset, sharing the same land cover classes to enablerefined bottom-up assessments. BAWLD-CH4 includes information onsite-level CH4 fluxes but also on study design (measurement method,timing, and frequency) and site characteristics (vegetation, climate,hydrology, soil, and sediment types, permafrost conditions, lake size anddepth, and our determination of land cover class). The different land coverclasses had distinct CH4 fluxes, resulting from definitions that wereeither based on or co-varied with key environmental controls. Fluxes ofCH4 from terrestrial ecosystems were primarily influenced by watertable position, soil temperature, and vegetation composition, while CH4fluxes from aquatic ecosystems were primarily influenced by watertemperature, lake size, and lake genesis. Models could explain more of thebetween-site variability in CH4 fluxes for terrestrial than aquaticecosystems, likely due to both less precise assessments of lake CH4fluxes and fewer consistently reported lake site characteristics. Analysisof BAWLD-CH4 identified both land cover classes and regions within theboreal and arctic domain, where future studies should be focused, alongsidemethodological approaches. Overall, BAWLD-CH4 provides a comprehensivedataset of CH4 emissions from high-latitude ecosystems that are usefulfor identifying research opportunities, for comparison against new fielddata, and model parameterization or validation. BAWLD-CH4 can bedownloaded from https://doi.org/10.18739/A2DN3ZX1R (Kuhn et al., 2021). 
    more » « less
  4. Abstract. Past efforts to synthesize and quantify the magnitude and change in carbon dioxide (CO2) fluxes in terrestrial ecosystems across the rapidly warming Arctic–boreal zone (ABZ) have provided valuable information but were limited in their geographical and temporal coverage. Furthermore, these efforts have been based on data aggregated over varying time periods, often with only minimal site ancillary data, thus limiting their potential to be used in large-scale carbon budget assessments. To bridge these gaps, we developed a standardized monthly database of Arctic–boreal CO2 fluxes (ABCflux) that aggregates in situ measurements of terrestrial net ecosystem CO2 exchange and its derived partitioned component fluxes: gross primary productivity and ecosystem respiration. The data span from 1989 to 2020 with over 70 supporting variables that describe key site conditions (e.g., vegetation and disturbance type), micrometeorological and environmental measurements (e.g., air and soil temperatures), and flux measurement techniques. Here, we describe these variables, the spatial and temporal distribution of observations, the main strengths and limitations of the database, and the potential research opportunities it enables. In total, ABCflux includes 244 sites and 6309 monthly observations; 136 sites and 2217 monthly observations represent tundra, and 108 sites and 4092 observations represent the boreal biome. The database includes fluxes estimated with chamber (19 % of the monthly observations), snow diffusion (3 %) and eddy covariance (78 %) techniques. The largest number of observations were collected during the climatological summer (June–August; 32 %), and fewer observations were available for autumn (September–October; 25 %), winter (December–February; 18 %), and spring (March–May; 25 %). ABCflux can be used in a wide array of empirical, remote sensing and modeling studies to improve understanding of the regional and temporal variability in CO2 fluxes and to better estimate the terrestrial ABZ CO2 budget. ABCflux is openly and freely available online (Virkkala et al., 2021b, https://doi.org/10.3334/ORNLDAAC/1934). 
    more » « less
  5. null (Ed.)
    Abstract. During the Program for Research on Oxidants: PHotochemistry, Emissions, and Transport (PROPHET) campaign from 21 July to 3 August 2016,field experiments on leaf-level trace gas exchange of nitric oxide (NO), nitrogen dioxide (NO2), and ozone (O3) were conducted for thefirst time on the native American tree species Pinus strobus (eastern white pine), Acer rubrum (redmaple), Populus grandidentata (bigtooth aspen), and Quercus rubra (red oak) in a temperate hardwood forest inMichigan, USA. We measured the leaf-level trace gas exchange rates andinvestigated the existence of an NO2 compensation point, hypothesizedbased on a comparison of a previously observed average diurnal cycle ofNOx (NO2+NO) concentrations with that simulated using amulti-layer canopy exchange model. Known amounts of trace gases wereintroduced into a tree branch enclosure and a paired blank referenceenclosure. The trace gas concentrations before and after the enclosures weremeasured, as well as the enclosed leaf area (single-sided) and gas flow rate to obtain the trace gas fluxes with respect to leaf surface. There was nodetectable NO uptake for all tree types. The foliar NO2 and O3uptake largely followed a diurnal cycle, correlating with that of the leafstomatal conductance. NO2 and O3 fluxes were driven by theirconcentration gradient from ambient to leaf internal space. The NO2 loss rate at the leaf surface, equivalently the foliar NO2 deposition velocity toward the leaf surface, ranged from 0 to 3.6 mm s−1 for bigtooth aspen and from 0 to 0.76 mm s−1 for red oak, both of which are∼90 % of the expected values based on the stomatalconductance of water. The deposition velocities for red maple and white pineranged from 0.3 to 1.6 and from 0.01 to 1.1 mm s−1, respectively, and were lower than predicted from the stomatal conductance, implying amesophyll resistance to the uptake. Additionally, for white pine, theextrapolated velocity at zero stomatal conductance was 0.4±0.08 mm s−1, indicating a non-stomatal uptake pathway. The NO2compensation point was ≤60 ppt for all four tree species andindistinguishable from zero at the 95 % confidence level. This agrees withrecent reports for several European and California tree species butcontradicts some earlier experimental results where the compensation pointswere found to be on the order of 1 ppb or higher. Given that the sampledtree types represent 80 %–90 % of the total leaf area at this site, theseresults negate the previously hypothesized important role of a leaf-scaleNO2 compensation point. Consequently, to reconcile these findings,further detailed comparisons between the observed and simulated in- and above-canopy NOx concentrations and the leaf- and canopy-scaleNOx fluxes, using the multi-layer canopy exchange model withconsideration of the leaf-scale NOx deposition velocities as well asstomatal conductances reported here, are recommended. 
    more » « less