skip to main content


Title: KGML-ag: a modeling framework of knowledge-guided machine learning to simulate agroecosystems: a case study of estimating N<sub>2</sub>O emission using data from mesocosm experiments
Abstract. Agricultural nitrous oxide (N2O) emission accounts for a non-trivialfraction of global greenhouse gas (GHG) budget. To date, estimatingN2O fluxes from cropland remains a challenging task because the relatedmicrobial processes (e.g., nitrification and denitrification) are controlledby complex interactions among climate, soil, plant and human activities.Existing approaches such as process-based (PB) models have well-knownlimitations due to insufficient representations of the processes oruncertainties of model parameters, and due to leverage recent advances inmachine learning (ML) a new method is needed to unlock the “black box” toovercome its limitations such as low interpretability, out-of-sample failureand massive data demand. In this study, we developed a first-of-its-kindknowledge-guided machine learning model for agroecosystems (KGML-ag) byincorporating biogeophysical and chemical domain knowledge from an advanced PBmodel, ecosys, and tested it by comparing simulating daily N2O fluxes withreal observed data from mesocosm experiments. The gated recurrent unit (GRU)was used as the basis to build the model structure. To optimize the modelperformance, we have investigated a range of ideas, including (1) usinginitial values of intermediate variables (IMVs) instead of time series asmodel input to reduce data demand; (2) building hierarchical structures toexplicitly estimate IMVs for further N2O prediction; (3) using multi-tasklearning to balance the simultaneous training on multiple variables; and (4)pre-training with millions of synthetic data generated from ecosys and fine-tuningwith mesocosm observations. Six other pure ML models were developed usingthe same mesocosm data to serve as the benchmark for the KGML-ag model.Results show that KGML-ag did an excellent job in reproducing the mesocosmN2O fluxes (overall r2=0.81, and RMSE=3.6 mgNm-2d-1from cross validation). Importantly, KGML-ag always outperformsthe PB model and ML models in predicting N2O fluxes, especially forcomplex temporal dynamics and emission peaks. Besides, KGML-ag goes beyondthe pure ML models by providing more interpretable predictions as well aspinpointing desired new knowledge and data to further empower the currentKGML-ag. We believe the KGML-ag development in this study will stimulate anew body of research on interpretable ML for biogeochemistry and otherrelated geoscience processes.  more » « less
Award ID(s):
2034385
NSF-PAR ID:
10387735
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
15
Issue:
7
ISSN:
1991-9603
Page Range / eLocation ID:
2839 to 2858
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Despite the need for researchers to understand terrestrial biospheric carbon fluxes to account for carbon cycle feedbacks and predict future CO2 concentrations, knowledge of these fluxes at the regional scale remains poor. This is particularly true in mountainous areas, where complex meteorology and lack of observations lead to large uncertainties in carbon fluxes. Yet mountainous regions are often where significant forest cover and biomass are found – i.e., areas that have the potential to serve as carbon sinks. As CO2 observations are carried out in mountainous areas, it is imperative that they are properly interpreted to yield information about carbon fluxes. In this paper, we present CO2 observations at three sites in the mountains of the western US, along with atmospheric simulations that attempt to extract information about biospheric carbon fluxes from the CO2 observations, with emphasis on the observed and simulated diurnal cycles of CO2. We show that atmospheric models can systematically simulate the wrong diurnal cycle and significantly misinterpret the CO2 observations, due to erroneous atmospheric flows as a result of terrain that is misrepresented in the model. This problem depends on the selected vertical level in the model and is exacerbated as the spatial resolution is degraded, and our results indicate that a fine grid spacing of ∼ 4 km or less may be needed to simulate a realistic diurnal cycle of CO2 for sites on top of the steep mountains examined here in the American Rockies. In the absence of higher resolution models, we recommend coarse-scale models to focus on assimilating afternoon CO2 observations on mountaintop sites over the continent to avoid misrepresentations of nocturnal transport and influence. 
    more » « less
  2. Abstract. Past efforts to synthesize and quantify the magnitude and change in carbon dioxide (CO2) fluxes in terrestrial ecosystems across the rapidly warming Arctic–boreal zone (ABZ) have provided valuable information but were limited in their geographical and temporal coverage. Furthermore, these efforts have been based on data aggregated over varying time periods, often with only minimal site ancillary data, thus limiting their potential to be used in large-scale carbon budget assessments. To bridge these gaps, we developed a standardized monthly database of Arctic–boreal CO2 fluxes (ABCflux) that aggregates in situ measurements of terrestrial net ecosystem CO2 exchange and its derived partitioned component fluxes: gross primary productivity and ecosystem respiration. The data span from 1989 to 2020 with over 70 supporting variables that describe key site conditions (e.g., vegetation and disturbance type), micrometeorological and environmental measurements (e.g., air and soil temperatures), and flux measurement techniques. Here, we describe these variables, the spatial and temporal distribution of observations, the main strengths and limitations of the database, and the potential research opportunities it enables. In total, ABCflux includes 244 sites and 6309 monthly observations; 136 sites and 2217 monthly observations represent tundra, and 108 sites and 4092 observations represent the boreal biome. The database includes fluxes estimated with chamber (19 % of the monthly observations), snow diffusion (3 %) and eddy covariance (78 %) techniques. The largest number of observations were collected during the climatological summer (June–August; 32 %), and fewer observations were available for autumn (September–October; 25 %), winter (December–February; 18 %), and spring (March–May; 25 %). ABCflux can be used in a wide array of empirical, remote sensing and modeling studies to improve understanding of the regional and temporal variability in CO2 fluxes and to better estimate the terrestrial ABZ CO2 budget. ABCflux is openly and freely available online (Virkkala et al., 2021b, https://doi.org/10.3334/ORNLDAAC/1934). 
    more » « less
  3. Abstract. Methane (CH4) emissions from the boreal and arcticregion are globally significant and highly sensitive to climate change.There is currently a wide range in estimates of high-latitude annualCH4 fluxes, where estimates based on land cover inventories andempirical CH4 flux data or process models (bottom-up approaches)generally are greater than atmospheric inversions (top-down approaches). Alimitation of bottom-up approaches has been the lack of harmonizationbetween inventories of site-level CH4 flux data and the land coverclasses present in high-latitude spatial datasets. Here we present acomprehensive dataset of small-scale, surface CH4 flux data from 540terrestrial sites (wetland and non-wetland) and 1247 aquatic sites (lakesand ponds), compiled from 189 studies. The Boreal–Arctic Wetland and LakeMethane Dataset (BAWLD-CH4) was constructed in parallel with acompatible land cover dataset, sharing the same land cover classes to enablerefined bottom-up assessments. BAWLD-CH4 includes information onsite-level CH4 fluxes but also on study design (measurement method,timing, and frequency) and site characteristics (vegetation, climate,hydrology, soil, and sediment types, permafrost conditions, lake size anddepth, and our determination of land cover class). The different land coverclasses had distinct CH4 fluxes, resulting from definitions that wereeither based on or co-varied with key environmental controls. Fluxes ofCH4 from terrestrial ecosystems were primarily influenced by watertable position, soil temperature, and vegetation composition, while CH4fluxes from aquatic ecosystems were primarily influenced by watertemperature, lake size, and lake genesis. Models could explain more of thebetween-site variability in CH4 fluxes for terrestrial than aquaticecosystems, likely due to both less precise assessments of lake CH4fluxes and fewer consistently reported lake site characteristics. Analysisof BAWLD-CH4 identified both land cover classes and regions within theboreal and arctic domain, where future studies should be focused, alongsidemethodological approaches. Overall, BAWLD-CH4 provides a comprehensivedataset of CH4 emissions from high-latitude ecosystems that are usefulfor identifying research opportunities, for comparison against new fielddata, and model parameterization or validation. BAWLD-CH4 can bedownloaded from https://doi.org/10.18739/A2DN3ZX1R (Kuhn et al., 2021). 
    more » « less
  4. null (Ed.)
    Abstract. Flux measurements of nitrogen oxides (NOx) were made over London usingairborne eddy covariance from a low-flying aircraft. Seven low-altitude flights were conducted over Greater London, performing multiple overpasses across the city during eight days in July 2014. NOx fluxes across theGreater London region (GLR) exhibited high heterogeneity and strong diurnalvariability, with central areas responsible for the highest emission rates(20–30 mg m−2 h−1). Other high-emission areas included the M25 orbital motorway. The complexity of London's emission characteristics makes it challenging to pinpoint single emissions sources definitively usingairborne measurements. Multiple sources, including road transport andresidential, commercial and industrial combustion sources, are all likely to contribute to measured fluxes. Measured flux estimates were compared toscaled National Atmospheric Emissions Inventory (NAEI) estimates, accountingfor monthly, daily and hourly variability. Significant differences were found between the flux-driven emissions and the NAEI estimates acrossGreater London, with measured values up to 2 times higher in Central London than those predicted by the inventory. To overcome the limitations ofusing the national inventory to contextualise measured fluxes, we usedphysics-guided flux data fusion to train environmental response functions(ERFs) between measured flux and environmental drivers (meteorological and surface). The aim was to generate time-of-day emission surfaces usingcalculated ERF relationships for the entire GLR; 98 % spatial coverage was achieved across the GLR at 400 m2 spatial resolution. All flight legprojections showed substantial heterogeneity across the domain, with highemissions emanating from Central London and major road infrastructure. Thediurnal emission structure of the GLR was also investigated, through ERF,with the morning rush hour distinguished from lower emissions during the early afternoon. Overall, the integration of airborne fluxes with anERF-driven strategy enabled the first independent generation of surfaceNOx emissions, at high resolution using an eddy-covariance approach,for an entire city region. 
    more » « less
  5. Abstract. Outlet glaciers that flow through the Transantarctic Mountains (TAM) experienced changes in ice thickness greater than other coastal regions of Antarctica during glacial maxima. As a result, ice-free areas that are currently exposed may have been covered by ice at various points during the Cenozoic, complicating our understanding of ecological succession in TAM soils. Our knowledge of glacial extent on small spatial scales is limited for the TAM, and studies of soil exposure duration and disturbance, in particular, are rare. We collected surface soil samples and, in some places, depth profiles every 5 cm to refusal (up to 30 cm) from 11ice-free areas along Shackleton Glacier, a major outlet glacier of the EastAntarctic Ice Sheet. We explored the relationship between meteoric 10Be and NO3- in these soils as a tool for understanding landscape disturbance and wetting history and as exposure proxies. Concentrations of meteoric 10Be spanned more than an order of magnitude across the region (2.9×108 to 73×108 atoms g−1) and are among the highest measured in polar regions. The concentrations of NO3- were similarly variable and ranged from ∼1 µg g−1 to 15 mg g−1. In examining differences and similarities in the concentrations of 10Be and NO3- with depth, we suggest that much of the southern portion of the Shackleton Glacier region has likely developed under a hyper-arid climate regime with minimal disturbance. Finally, we inferred exposure time using 10Be concentrations. This analysis indicates that the soils we analyzed likelyrange from recent exposure (following the Last Glacial Maximum) to possibly>6 Myr. We suggest that further testing and interrogation of meteoric 10Be and NO3- concentrations and relationships in soils can provide important information regarding landscape development, soil evolution processes, and inferred exposure durations of surfaces in the TAM. 
    more » « less