Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Nitrous oxide (N₂O) is a potent and persistent greenhouse gas, with rising atmospheric concentrations driven in part by inefficient use of synthetic nitrogen (N) fertilizers in agriculture. Predicting soil N₂O emissions is challenging due to high spatial and temporal variability arising from complex soil biogeochemical processes. Process-based ecosystem models and standalone machine learning (ML) approaches without extensive site-specific calibration often miss high emission episodes. Here, we show how an Ensemble Modeling System (EMS) based on outputs from an ensemble of ecosystem models coupled to an ensemble of ML models can improve predictions and understanding of N2O fluxes from US cropland. Trained and validated on approximately 12,000 N2O chamber measurements at 17 U.S. Midwest sites (six crops, 35 management practices), the EMS accurately predicted daily fluxes of N2O at both training (R² = 0.84, RMSE = 16.4 g N ha⁻¹ d⁻¹) and held-out testing sites (R² = 0.84, RMSE = 6.2 g N ha⁻¹ d⁻¹). Analyses identified six dominant N₂O drivers: soil organic carbon (SOC), NH₄⁺, NO₃⁻, water-filled pore space (WFPS), soil temperature, and biomass production. Wet, warm soils produced large N₂O peaks only with sufficient SOC and mineral N; in low-SOC soils, fluxes remained low. Incorporating these drivers into process-based models might significantly improve their predictive capacity. The EMS demonstrates a strong potential to predict N₂O fluxes at unseen sites, enabling more reliable regional inventories, improved gap-filling where measurements are sparse, and enhanced understanding of mechanisms to advance targeted mitigation strategies in food, feed, and bioenergy crops. # Data from: Coupled machine learning-ecosystem ensemble models substantially improve predictions of nitrous oxide (N~2~O) fluxes from US croplands Dataset DOI: [10.5061/dryad.pvmcvdnzx](10.5061/dryad.pvmcvdnzx) ## Description of the data and file structure We present here the data that were used for the analysis presented in: Coupled machine learning-ecosystem ensemble models substantially improve predictions of nitrous oxide (N~2~O) fluxes from US croplands. ### Files and variables Files: Dataset_S1.xlsx, Dataset_S2.csv, Dataset_S3.csv, #### **Description:** **Description of data sheets** **Dataset S1A columns** * **Site_ID:** Numeric identifier for the experimental site. * **Treatment_ID:** Numeric code for the management treatment applied at that site * **DataUse:** To assign data to be used for model training (development) and testing (held-out evaluation) * **State/Province:** State acronym * **Latitude decimal deg:** Site location latitude * **Longitude decimal deg:** Site location longitude * **Start Data Year:** Starting year of data used * **End Data Year:** Ending year of data used * **Cover crop:** Type of cover crop used within the treatment * **Rotation Descriptor:** Describe the rotation of crops within the treatment * **Tillage Descriptor:** Describe tillage type within the treatment * **Residual Removal:** Describe residual management within the treatment * **Irrigation:** Describe if irrigation was applied or not within the treatment * **N Treatment Descriptor:** Describe nitrogen amendments within the treatment * **Reference:** Reference for the data **Dataset S1B**: This sheet contains the reference list for the data used **Dataset S2 columns** * **Date:** Gas sampling days * **Site_ID:** Numeric identifier for the experimental site. * **Treatment_ID:** Numeric code for the management treatment applied at that site * **DataUse:** To assign data to be used for model training (development) and testing (held-out evaluation) * **Observed N2O:** Daily average N2O flux measured (g N2O-N ha-1d-1) * **Predicted N2O:** Daily average N2O flux predicted by multimodel hybrid framework (g N2O-N ha-1d-1) * **NH4:** Process-based models simulated daily NH4-N content in the top 30-cm soil layer (kg ha-1) * **SOC:** Process-based models simulated daily soil organic carbon in the top 30-cm soil layer (kg ha-1) * **NO3:** Process-based models simulated daily NO3-N content in the top 30-cm soil layer (kg ha-1) * **ST:** Process-based models simulated daily average soil temperature in the top 30-cm soil layer (°C) * **WFPS:** Process-based models simulated daily water-filled pore space in the top 30-cm soil layer (fraction) * **ABG:** Process-based models simulated daily above-ground biomass (kg ha-1) * **BG:** Process-based models simulated daily below-ground biomass (kg ha-1) * **SRAD:** Average solar radiation for the last five days before gas sampling (Watt m-2) * **Tmax:** Average maximum air temperature for the last three days before gas sampling (°C) * **APrecip:** Average precipitation in the last fifteen days before gas sampling (mm) * **Wspd:** Average wind in the last fifteen days before gas sampling (m s-1) * **LAI:** Process-based models simulated daily leaf area index (m2 m2) * **Nstress:** Process-based models simulated the daily nitrogen stress factor (fraction) * **Wstress:** Process-based models simulated the daily water stress factor (fraction) * **PET:** Process-based models simulated daily potential evapotranspiration (mm) * **SE:** Process-based models simulated daily soil evaporation (mm) * **SPrecip:** Cumulative precipitation in the last two days before gas sampling (mm) * **SH:** Average specific humidity in the last three days before gas sampling (g kg-1) * **RH:** Average relative humidity in the last fifteen days before gas sampling (%) **Dataset S3 columns** * **Date:** Gas sampling days * **Site_ID:** Numeric identifier for the experimental site. * **Treatment_ID:** Numeric code for the management treatment applied at that site * **DataUse:** To assign data to be used for model training (development) and testing (held-out evaluation) * **SD:** Monte Carlo standard deviation of the simulated daily N₂O flux distribution (g N2O-N ha-1d-1) * **CV**: Monte Carlo coefficient of variation of the simulated daily N₂O flux distribution (%) * **CI05:** 5th percentile (lower 90 % confidence bound) of the Monte Carlo flux distribution(g N2O-N ha-1d-1) * **CI95:** 95th percentile (upper 90 % confidence bound) of the Monte Carlo flux distribution(g N2O-N ha-1d-1)more » « less
-
Nitrous oxide (N2O) is a potent and persistent greenhouse gas, with rising atmospheric concentrations driven in part by inefficient use of synthetic nitrogen (N) fertilizers in agriculture. Predicting soil N2O emissions is challenging due to high spatial and temporal variability arising from complex soil biogeochemical processes. Process-based ecosystem models and standalone machine learning (ML) approaches without extensive site-specific calibration often miss high-emission episodes. Here, we show how an Ensemble Modeling System (EMS) based on outputs from an ensemble of ecosystem models coupled to an ensemble of ML models can improve predictions and understanding of N2O fluxes from US cropland. Trained and validated on ~12,000 N2O chamber measurements at 17 US Midwest sites (six crops, 35 management practices), the EMS accurately predicted daily fluxes of N2O at both training (R2= 0.84, RMSE = 16.4 g N ha−1d−1) and held-out testing sites (R2= 0.84, RMSE = 6.2 g N ha−1d−1). Analyses identified six dominant N2O drivers: soil organic carbon (SOC), NH4+, NO3-, water-filled pore space, temperature, and aboveground biomass production. Wet, warm soils produced large N2O peaks only with sufficient SOC and mineral N; in low-SOC soils, fluxes remained low. Incorporating these drivers into process-based models might significantly improve their predictive capacity. The EMS demonstrates a strong potential to predict N2O fluxes at unseen sites, enabling more reliable regional inventories, improved gap-filling where measurements are sparse, and enhanced understanding of mechanisms to advance targeted mitigation strategies in food, feed, and bioenergy crops.more » « less
-
Abstract Phosphorus (P) budgets for cropping systems provide insights for keeping soil P at optimal levels for crops while avoiding excess inputs. We quantified 12 years of P inputs (fertilizer and atmospheric deposition) and outputs (harvest and leaching losses) for replicated maize (Zea maysL.)—soybean (Glycine maxL.)—wheat (Triticum aestivum) crop rotations under conventional, no‐till, reduced input, and biologically based (organic without compost or manure) management systems at the Kellogg Biological Station LTAR site in southwest Michigan. Conventional, no‐till, and reduced input systems were fertilized between 13 and 50 kg P ha−1depending on year. Soil test phosphorus (STP) was measured at 0‐ to 25‐cm depth every autumn. Leached P was measured as dissolved P in the soil solution sampled beneath the rooting depth and combined with modeled percolation. Fertilization and harvest were the predominant P fluxes in the fertilized systems, whereas only harvest dominated P flux in the unfertilized organic system. Leaching losses were minor terms in the budgets, but leachate concentrations were nevertheless close to the range of concern for downstream eutrophication. Over the 12‐year study period, the organic system exhibited a negative P balance (−82.0 kg P ha−1), coinciding with suboptimal STP levels, suggesting a need for P supplementation. In contrast, the fertilized systems showed positive P balances (mean: 70.1 kg P ha−1) with STP levels well above agronomic optima. Results underscore the importance of tailored P management strategies to sustain crop productivity while mitigating environmental impacts.more » « less
-
Abstract This study investigates uncertainties in greenhouse gas (GHG) emission factors related to switchgrass‐based biofuel production in Michigan. Using three life cycle assessment (LCA) databases—US lifecycle inventory (USLCI) database, GREET, and Ecoinvent—each with multiple versions, we recalculated the global warming intensity (GWI) and GHG mitigation potential in a static calculation. Employing Monte Carlo simulations along with local and global sensitivity analyses, we assess uncertainties and pinpoint key parameters influencing GWI. The convergence of results across our previous study, static calculations, and Monte Carlo simulations enhances the credibility of estimated GWI values. Static calculations, validated by Monte Carlo simulations, offer reasonable central tendencies, providing a robust foundation for policy considerations. However, the wider range observed in Monte Carlo simulations underscores the importance of potential variations and uncertainties in real‐world applications. Sensitivity analyses identify biofuel yield, GHG emissions of electricity, and soil organic carbon (SOC) change as pivotal parameters influencing GWI. Decreasing uncertainties in GWI may be achieved by making greater efforts to acquire more precise data on these parameters. Our study emphasizes the significance of considering diverse GHG factors and databases in GWI assessments and stresses the need for accurate electricity fuel mixes, crucial information for refining GWI assessments and informing strategies for sustainable biofuel production.more » « less
-
Irrigation can increase crop yields and could be a key climate adaptation strategy. However, future water availability is uncertain. Here we explore the economic costs and benefits of existing and expanded irrigation of maize and soybean throughout the United States. We examine both middle and end of the 21st-century conditions under future climates that span the range of projections. By mid-century we find an expansion in the area where the benefits of irrigation outweigh groundwater pumping and equipment ownership costs. Increased crop water demands limit the region where maize could be sustainably irrigated, but sustainably irrigated soybean is likely feasible throughout regions of the midwestern and southeastern United States. Shifting incentives for installing and maintaining irrigation equipment could place additional challenges on resource availability. It will be important for decision makers to understand and account for local water demand and availability when developing policies guiding irrigation installation and use.more » « less
-
Abstract The Kellogg Biological Station Long‐term Agroecosystem Research site (KBS LTAR) joined the national LTAR Network in 2015 to represent a northeast portion of the North Central Region, extending across 76,000 km2of southern Michigan and northern Indiana. Regional cropping systems are dominated by corn (Zea mays)–soybean (Glycine max) rotations managed with conventional tillage, industry‐average rates of fertilizer and pesticide inputs uniformly applied, few cover crops, and little animal integration. In 2020, KBS LTAR initiated the Aspirational Cropping System Experiment as part of the LTAR Common Experiment, a co‐production model wherein stakeholders and researchers collaborate to advance transformative change in agriculture. The Aspirational (ASP) cropping system treatment, designed by a team of agronomists, farmers, scientists, and other stakeholders, is a five‐crop rotation of corn, soybean, winter wheat (Triticum aestivum), winter canola (Brassicus napus), and a diverse forage mix. All phases are managed with continuous no‐till, variable rate fertilizer inputs, and integrated pest management to provide benefits related to economic returns, water quality, greenhouse gas mitigation, soil health, biodiversity, and social well‐being. Cover crops follow corn and winter wheat, with fall‐planted crops in the rotation providing winter cover in other years. The experiment is replicated with all rotation phases at both the plot and field scales and with perennial prairie strips in consistently low‐producing areas of ASP fields. The prevailing practice (or Business as usual [BAU]) treatment mirrors regional prevailing practices as revealed by farmer surveys. Stakeholders and researchers evaluate the success of the ASP and BAU systems annually and implement management changes on a 5‐year cycle.more » « less
-
Abstract Understanding subfield crop yields and temporal stability is critical to better manage crops. Several algorithms have proposed to study within-field temporal variability but they were mostly limited to few fields. In this study, a large dataset composed of 5520 yield maps from 768 fields provided by farmers was used to investigate the influence of subfield yield distribution skewness on temporal variability. The data are used to test two intuitive algorithms for mapping stability: one based on standard deviation and the second based on pixel ranking and percentiles. The analysis of yield monitor data indicates that yield distribution is asymmetric, and it tends to be negatively skewed (p < 0.05) for all of the four crops analyzed, meaning that low yielding areas are lower in frequency but cover a larger range of low values. The mean yield difference between the pixels classified as high-and-stable and the pixels classified as low-and-stable was 1.04 Mg ha −1 for maize, 0.39 Mg ha −1 for cotton, 0.34 Mg ha −1 for soybean, and 0.59 Mg ha −1 for wheat. The yield of the unstable zones was similar to the pixels classified as low-and-stable by the standard deviation algorithm, whereas the two-way outlier algorithm did not exhibit this bias. Furthermore, the increase in the number years of yield maps available induced a modest but significant increase in the certainty of stability classifications, and the proportion of unstable pixels increased with the precipitation heterogeneity between the years comprising the yield maps.more » « less
An official website of the United States government
