In this study, nine different statistical models are constructed using different combinations of predictors, including models with and without projected predictors. Multiple machine learning (ML) techniques are employed to optimize the ensemble predictions by selecting the top performing ensemble members and determining the weights for each ensemble member. The ML-Optimized Ensemble (ML-OE) forecasts are evaluated against the Simple-Averaging Ensemble (SAE) forecasts. The results show that for the response variables that are predicted with significant skill by individual ensemble members and SAE, such as Atlantic tropical cyclone counts, the performance of SAE is comparable to the best ML-OE results. However, for response variables that are poorly modeled by individual ensemble members, such as Atlantic and Gulf of Mexico major hurricane counts, ML-OE predictions often show higher skill score than individual model forecasts and the SAE predictions. However, neither SAE nor ML-OE was able to improve the forecasts of the response variables when all models show consistent bias. The results also show that increasing the number of ensemble members does not necessarily lead to better ensemble forecasts. The best ensemble forecasts are from the optimally combined subset of models.
more »
« less
This content will become publicly available on May 28, 2026
The PEcAn+SIPNET Terrestrial Carbon Cycle Reanalysis: Development and Validation
Improving our ability to understand and predict the dynamics of the terrestrial carbon cycle remains a pressing challenge despite a rapidly growing volume and diversity of Earth Observation data. State data assimilation represents a path forward via an iterative cycle of making process-based forecasts and then statistically reconciling these forecasts against numerous ground-based and remotely-sensed data constraints into a “reanalysis” data product that provides full spatiotemporal carbon budgets with robust uncertainty accounting. Here we report on an >100x expansion of the PEcAn+SIPNET reanalysis from 500 sites CONUS, 25 ensemble members, and 2 data constraints to 6400 sites across North America, 100 ensemble members, and 5 data constraints: GEDI and Landtrendr AGB, MODIS LAI, SoilGrids Soil C, and SMAP soil moisture. We also report on an ensemble-based machine learning (ML) downscaling to a 1km product that preserves spatial, temporal, and across-variable covariances and demonstrate the impacts of these covariances on uncertainty accounting (Fig. 1). Synergistically, we use the same ML models to assess what climate, vegetation, and soil variables explain the spatiotemporal variability in different C pools and fluxes. In addition, we review a wide range of ongoing validation activities, comparing the outputs of the reanalysis against withheld data from: Ameriflux and NEON NEE and LE; USFS Forest Inventory biomass, biomass increment, tree rings, soil C, and litter; and NEON soil C and soil respiration. Finally, we touch on ML analyses to diagnose and correct systematic biases and emulator-based recalibration efforts.
more »
« less
- Award ID(s):
- 2406258
- PAR ID:
- 10604937
- Publisher / Repository:
- eLTER Annual Meeting
- Date Published:
- Journal Name:
- ARPHA Conference Abstracts
- Volume:
- 8
- ISSN:
- 2603-3925
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Rangelands provide significant environmental benefits through many ecosystem services, which may include soil organic carbon (SOC) sequestration. However, quantifying SOC stocks and monitoring carbon (C) fluxes in rangelands are challenging due to the considerable spatial and temporal variability tied to rangeland C dynamics as well as limited data availability. We developed the Rangeland Carbon Tracking and Management (RCTM) system to track long‐term changes in SOC and ecosystem C fluxes by leveraging remote sensing inputs and environmental variable data sets with algorithms representing terrestrial C‐cycle processes. Bayesian calibration was conducted using quality‐controlled C flux data sets obtained from 61 Ameriflux and NEON flux tower sites from Western and Midwestern US rangelands to parameterize the model according to dominant vegetation classes (perennial and/or annual grass, grass‐shrub mixture, and grass‐tree mixture). The resulting RCTM system produced higher model accuracy for estimating annual cumulative gross primary productivity (GPP) (R2 > 0.6, RMSE <390 g C m−2) relative to net ecosystem exchange of CO2(NEE) (R2 > 0.4, RMSE <180 g C m−2). Model performance in estimating rangeland C fluxes varied by season and vegetation type. The RCTM captured the spatial variability of SOC stocks withR2 = 0.6 when validated against SOC measurements across 13 NEON sites. Model simulations indicated slightly enhanced SOC stocks for the flux tower sites during the past decade, which is mainly driven by an increase in precipitation. Future efforts to refine the RCTM system will benefit from long‐term network‐based monitoring of vegetation biomass, C fluxes, and SOC stocks.more » « less
-
Abstract Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as postprocessing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and 2-m temperature 2 weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multimodel approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability. Significance StatementAccurately forecasting temperature and precipitation on subseasonal time scales—2 weeks–2 months in advance—is extremely challenging. These forecasts would have immense value in agriculture, insurance, and economics. Our paper describes an application of machine learning techniques to improve forecasts of monthly average precipitation and 2-m temperature using lagged physics-based predictions and observational data 2 weeks in advance for the entire continental United States. For lagged ensembles, the proposed models outperform standard benchmarks such as historical averages and averages of physics-based predictions. Our findings suggest that utilizing the full set of physics-based predictions instead of the average enhances the accuracy of the final forecast.more » « less
-
Abstract Carbon fluxes in terrestrial ecosystems and their response to environmental change are a major source of uncertainty in the modern carbon cycle. The National Ecological Observatory Network (NEON) presents the opportunity to merge eddy covariance (EC)‐derived fluxes with CO2isotope ratio measurements to gain insights into carbon cycle processes. Collected continuously and consistently across >40 sites, NEON EC and isotope data facilitate novel integrative analyses. However, currently provisioned atmospheric isotope data are uncalibrated, greatly limiting ability to perform cross‐site analyses. Here, we present two approaches to calibrating NEON CO2isotope ratios, along with an R package to calibrate NEON data. We find that calibrating CO2isotopologues independently yields a lowerδ13C bias (<0.05‰) and higher precision (<0.40‰) than directly correctingδ13C with linear regression (bias: <0.11‰, precision: 0.42‰), but with slightly higher error and lower precision in calibrated CO2mole fraction. The magnitude of the corrections toδ13C and CO2mole fractions vary substantially by site, underscoring the need for users to apply a consistent calibration framework to data in the NEON archive. Post‐calibration data sets show that site mean annualδ13C correlates negatively with precipitation, temperature, and aridity, but positively with elevation. Forested and agricultural ecosystems exhibit larger gradients in CO2andδ13C than other sites, particularly during the summer and at night. The overview and analysis tools developed here will facilitate cross‐site analysis using NEON data, provide a model for other continental‐scale observational networks, and enable new advances leveraging the isotope ratios of specific carbon fluxes.more » « less
-
Abstract Accurate quantification of soil carbon fluxes is essential to reduce uncertainty in estimates of the terrestrial carbon sink. However, these fluxes vary over time and across ecosystem types and so, it can be difficult to estimate them accurately across large scales. The flux‐gradient method estimates soil carbon fluxes using co‐located measurements of soil CO2concentration, soil temperature, soil moisture and other soil properties. The National Ecological Observatory Network (NEON) provides such data across 20 ecoclimatic domains spanning the continental U.S., Puerto Rico, Alaska and Hawai‘i.We present an R software package (neonSoilFlux) that acquires soil environmental data to compute half‐hourly soil carbon fluxes for each soil replicate plot at a given terrestrial NEON site. To assess the computed fluxes, we visited six focal NEON sites and measured soil carbon fluxes using a closed‐dynamic chamber approach.Outputs from theneonSoilFluxshowed agreement with measured fluxes (R2between measured andneonSoilFluxoutputs ranging from 0.12 to 0.77 depending on calculation method used); measured outputs generally fell within the range of calculated uncertainties from the gradient method. Calculated fluxes fromneonSoilFluxaggregated to the daily scale exhibited expected site‐specific seasonal patterns.While the flux‐gradient method is broadly effective, its accuracy is highly sensitive to site‐specific inputs, including the extent to which gap‐filing techniques are used to interpolate missing sensor data and to estimates of soil diffusivity and moisture content. Future refinement and validation ofneonSoilFluxoutputs can contribute to existing databases of soil carbon flux measurements, providing near real‐time estimates of a critical component of the terrestrial carbon cycle.more » « less
An official website of the United States government
