skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Sharma, Prateek"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Nitrous oxide (N₂O) is a potent and persistent greenhouse gas, with rising atmospheric concentrations driven in part by inefficient use of synthetic nitrogen (N) fertilizers in agriculture. Predicting soil N₂O emissions is challenging due to high spatial and temporal variability arising from complex soil biogeochemical processes. Process-based ecosystem models and standalone machine learning (ML) approaches without extensive site-specific calibration often miss high emission episodes. Here, we show how an Ensemble Modeling System (EMS) based on outputs from an ensemble of ecosystem models coupled to an ensemble of ML models can improve predictions and understanding of N2O fluxes from US cropland. Trained and validated on approximately 12,000 N2O chamber measurements at 17 U.S. Midwest sites (six crops, 35 management practices), the EMS accurately predicted daily fluxes of N2O at both training (R² = 0.84, RMSE = 16.4 g N ha⁻¹ d⁻¹) and held-out testing sites (R² = 0.84, RMSE = 6.2 g N ha⁻¹ d⁻¹). Analyses identified six dominant N₂O drivers: soil organic carbon (SOC), NH₄⁺, NO₃⁻, water-filled pore space (WFPS), soil temperature, and biomass production. Wet, warm soils produced large N₂O peaks only with sufficient SOC and mineral N; in low-SOC soils, fluxes remained low. Incorporating these drivers into process-based models might significantly improve their predictive capacity. The EMS demonstrates a strong potential to predict N₂O fluxes at unseen sites, enabling more reliable regional inventories, improved gap-filling where measurements are sparse, and enhanced understanding of mechanisms to advance targeted mitigation strategies in food, feed, and bioenergy crops. # Data from: Coupled machine learning-ecosystem ensemble models substantially improve predictions of nitrous oxide (N~2~O) fluxes from US croplands Dataset DOI: [10.5061/dryad.pvmcvdnzx](10.5061/dryad.pvmcvdnzx) ## Description of the data and file structure We present here the data that were used for the analysis presented in: Coupled machine learning-ecosystem ensemble models substantially improve predictions of nitrous oxide (N~2~O) fluxes from US croplands. ### Files and variables Files: Dataset_S1.xlsx, Dataset_S2.csv, Dataset_S3.csv, #### **Description:**  **Description of data sheets** **Dataset S1A columns** * **Site_ID:** Numeric identifier for the experimental site. * **Treatment_ID:** Numeric code for the management treatment applied at that site * **DataUse:** To assign data to be used for model training (development) and testing (held-out evaluation) * **State/Province:** State acronym * **Latitude decimal deg:** Site location latitude * **Longitude decimal deg:** Site location longitude * **Start Data Year:** Starting year of data used * **End Data Year:** Ending year of data used * **Cover crop:** Type of cover crop used within the treatment * **Rotation Descriptor:** Describe the rotation of crops within the treatment * **Tillage Descriptor:** Describe tillage type within the treatment * **Residual Removal:** Describe residual management within the treatment * **Irrigation:** Describe if irrigation was applied or not within the treatment * **N Treatment Descriptor:** Describe nitrogen amendments within the treatment * **Reference:** Reference for the data **Dataset S1B**: This sheet contains the reference list for the data used **Dataset S2 columns** * **Date:** Gas sampling days * **Site_ID:** Numeric identifier for the experimental site. * **Treatment_ID:** Numeric code for the management treatment applied at that site * **DataUse:** To assign data to be used for model training (development) and testing (held-out evaluation) * **Observed N2O:** Daily average N2O flux measured (g N2O-N ha-1d-1) * **Predicted N2O:** Daily average N2O flux predicted by multimodel hybrid framework (g N2O-N ha-1d-1) * **NH4:** Process-based models simulated daily NH4-N content in the top 30-cm soil layer (kg ha-1) * **SOC:** Process-based models simulated daily soil organic carbon in the top 30-cm soil layer (kg ha-1) * **NO3:** Process-based models simulated daily NO3-N content in the top 30-cm soil layer (kg ha-1) * **ST:** Process-based models simulated daily average soil temperature in the top 30-cm soil layer (°C) * **WFPS:** Process-based models simulated daily water-filled pore space in the top 30-cm soil layer (fraction) * **ABG:** Process-based models simulated daily above-ground biomass (kg ha-1) * **BG:** Process-based models simulated daily below-ground biomass (kg ha-1) * **SRAD:** Average solar radiation for the last five days before gas sampling (Watt m-2) * **Tmax:** Average maximum air temperature for the last three days before gas sampling  (°C) * **APrecip:** Average precipitation in the last fifteen days before gas sampling (mm) * **Wspd:** Average wind in the last fifteen days before gas sampling (m s-1) * **LAI:** Process-based models simulated daily leaf area index (m2 m2) * **Nstress:** Process-based models simulated the daily nitrogen stress factor (fraction) * **Wstress:** Process-based models simulated the daily water stress factor (fraction) * **PET:** Process-based models simulated daily potential evapotranspiration (mm) * **SE:** Process-based models simulated daily soil evaporation (mm) * **SPrecip:** Cumulative precipitation in the last two days before gas sampling (mm) * **SH:** Average specific humidity in the last three days before gas sampling (g kg-1) * **RH:** Average relative humidity in the last fifteen days before gas sampling (%) **Dataset S3 columns** * **Date:** Gas sampling days * **Site_ID:** Numeric identifier for the experimental site. * **Treatment_ID:** Numeric code for the management treatment applied at that site * **DataUse:** To assign data to be used for model training (development) and testing (held-out evaluation) * **SD:** Monte Carlo standard deviation of the simulated daily N₂O flux distribution (g N2O-N ha-1d-1) * **CV**: Monte Carlo coefficient of variation of the simulated daily N₂O flux distribution (%) * **CI05:** 5th percentile (lower 90 % confidence bound) of the Monte Carlo flux distribution(g N2O-N ha-1d-1) * **CI95:** 95th percentile (upper 90 % confidence bound) of the Monte Carlo flux distribution(g N2O-N ha-1d-1) 
    more » « less
  2. Nitrous oxide (N2O) is a potent and persistent greenhouse gas, with rising atmospheric concentrations driven in part by inefficient use of synthetic nitrogen (N) fertilizers in agriculture. Predicting soil N2O emissions is challenging due to high spatial and temporal variability arising from complex soil biogeochemical processes. Process-based ecosystem models and standalone machine learning (ML) approaches without extensive site-specific calibration often miss high-emission episodes. Here, we show how an Ensemble Modeling System (EMS) based on outputs from an ensemble of ecosystem models coupled to an ensemble of ML models can improve predictions and understanding of N2O fluxes from US cropland. Trained and validated on ~12,000 N2O chamber measurements at 17 US Midwest sites (six crops, 35 management practices), the EMS accurately predicted daily fluxes of N2O at both training (R2= 0.84, RMSE = 16.4 g N ha−1d−1) and held-out testing sites (R2= 0.84, RMSE = 6.2 g N ha−1d−1). Analyses identified six dominant N2O drivers: soil organic carbon (SOC), NH4+, NO3-, water-filled pore space, temperature, and aboveground biomass production. Wet, warm soils produced large N2O peaks only with sufficient SOC and mineral N; in low-SOC soils, fluxes remained low. Incorporating these drivers into process-based models might significantly improve their predictive capacity. The EMS demonstrates a strong potential to predict N2O fluxes at unseen sites, enabling more reliable regional inventories, improved gap-filling where measurements are sparse, and enhanced understanding of mechanisms to advance targeted mitigation strategies in food, feed, and bioenergy crops. 
    more » « less
  3. Abstract The circumgalactic medium (CGM) is poorly constrained at the subparsec scales relevant to turbulent energy dissipation and regulation of multiphase structure. Fast radio bursts are sensitive to small-scale plasma density fluctuations, which can induce multipath propagation (scattering). The amount of scattering depends on the density fluctuation spectrum, including its amplitude C n 2 , spectral indexβ, and dissipation scaleli. We use quasar observations of CGM turbulence at ≳pc scales to infer C n 2 , finding it to be 1 0 16 C n 2 1 0 9 m−20/3for hot (T> 106K) gas and 1 0 8 C n 2 1 0 4 m−20/3for cool (104≲T≲ 105K) gas, depending on the gas sound speed and density. These values of C n 2 are much smaller than those inferred in the interstellar medium at similar physical scales. The resulting scattering delays from the hot CGM are negligible (≪1μs at 1 GHz), but they are more detectable from the cool gas as either radio pulse broadening or scintillation, depending on the observing frequency and sightline geometry. Joint quasar-FRB observations of individual galaxies can yield lower limits onli, even if the CGM is not a significant scattering site. An initial comparison between quasar and FRB observations (albeit for different systems) suggestsli≳ 750 km in ∼104K gas in order for the quasar and FRB constraints to be consistent. If a foreground CGM is completely ruled out as a source of scattering along an FRB sightline, thenlimay be comparable to the smallest cloud sizes (≲pc) inferred from photoionization modeling of quasar absorption lines. 
    more » « less
  4. We develop a comprehensive framework for storing, analyzing, forecasting, and visualizing industrial energy systems consisting of multiple devices and sensors. Our framework models complex energy systems as a dynamic knowledge graph, utilizes a novel machine learning (ML) model for energy forecasting, and visualizes continuous predictions through an interactive dashboard. At the core of this framework is A-RNN, a simple yet efficient model that uses dynamic attention mechanisms for automated feature selection. We validate the model using datasets from two manufacturers and one university testbed containing hundreds of sensors. Our results show that A-RNN forecasts energy usage within 5% of observed values. These enhanced predictions are as much as 50% more accurate than those produced by standard RNN models that rely on individual features and devices. Additionally, A-RNN identifies key features that impact forecasting accuracy, providing interpretability for model forecasts. Our analytics platform is computationally and memory efficient, making it suitable for deployment on edge devices and in manufacturing plants. 
    more » « less