skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 23, 2026

Title: Probabilistic Spatial Interpolation of Sparse Data using Diffusion Models
Climate models today depend critically on confident initial conditions, a reasonably plausible snapshot of the Earth from which all future predictions emerge. However, given the inherently chaotic nature of our system, this constraint is complicated by sensitivity dependence, where small uncertainties can lead to exponentially diverging outcomes over time. This challenge is particularly salient at global spatial scales and over centennial timescales, where data gaps are not just common but expected. The source of uncertainty is two-fold: (1) sparse, noisy observations from satellites and ground stations, and (2) variability stemming from simplifying approximations within the models themselves. In practice, data assimilation methods are used to reconcile this missing information by conditioning model states on available observations. Our work builds on this idea but operates at the extreme end of sparsity. We propose a conditional data imputation framework that reconstructs full temperature fields from as little as 1% observational coverage. The method leverages a diffusion model guided by a prekriged mask, effectively inferring the full-state fields from minimal data points. We validate our framework over the Southern Great Plains, focusing on afternoon through night (12:00 PM–12:00 AM) temperature fields during the summer months of 2018–2021. Across varying observational densities—from swath data to isolated in situ sensors—our model achieves strong reconstruction accuracy, highlighting its potential to fill in critical data gaps in both historical reanalysis and real-time forecasting pipelines.  more » « less
Award ID(s):
2332069 2042325
PAR ID:
10658206
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
American Meteorological Society
Date Published:
Journal Name:
Artificial Intelligence for the Earth Systems
ISSN:
2769-7525
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Accurate representation of the hourly variation in the NO2-column-to-surface relationship is essential for interpreting geostationary observations of NO2 columns. Previous research indicated inconsistencies in this hourly variation. This study employs the high-performance configuration of the GEOS-Chem model (GCHP) to analyze daytime hourly NO2 total columns and surface concentrations during summer. We use measurements from globally distributed Pandora sun photometers and aircraft observations over the United States. We correct Pandora total NO2 vertical columns for (1) hourly variations in effective temperature driven by vertically resolved contributions to the total column and (2) changes in local solar time along the Pandora line of sight. These corrections increase the total NO2 columns by 5–6 × 1014 molec. cm−2 at 09:00 and 18:00 across all sites. Fine-scale simulations from GHCP (∼12 km) reduce the normalized bias (NB) against Pandora total NO2 columns from 19 % to 10 % and against aircraft measurements from 25 % to 13 % in Maryland, Texas, and Colorado. Similar reductions are observed in NO2 columns over the eastern US (17 % to 9 %), the western US (22 % to 14 %), Europe (24 % to 15 %), and Asia (29 % to 21 %) when compared to 55 km simulations. Our analysis attributes the weaker hourly variability in the total NO2 column to (1) hourly variations in column effective temperature, (2) local solar time changes along the Pandora line of sight, and (3) differences in hourly NO2 variability from different atmospheric layers, with the lowest 500 m exhibiting greater variability, while the dominant residual column above 500 m exhibits weaker variability. 
    more » « less
  2. Mireji, Paul O (Ed.)
    West Nile virus (WNV) is the leading mosquito-borne disease causing-pathogen in the United States. Concerningly, there are no prophylactics or drug treatments for WNV and public health programs rely heavily on vector control efforts to lessen disease incidence. Insecticides can be effective in reducing vector numbers if implemented strategically, but can diminish in efficacy and promote insecticide resistance otherwise. Vector control programs which employ mass-fogging applications of insecticides, often conduct these methods during the late-night hours, when diel temperatures are coldest, and without a-priori knowledge on daily mosquito activity patterns. This study’s aims were to 1) quantify the effect of temperature on the toxicity of two conventional insecticides used in fogging applications (malathion and deltamethrin) toCulex tarsalis, an important WNV vector, and 2) quantify the time of host-seeking ofCx.tarsalisand other local mosquito species in Maricopa County, Arizona. The temperature-toxicity relationship of insecticides was assessed using the WHO tube bioassay, and adultCx.tarsalis, collected as larvae, were exposed to three different insecticide doses at three temperature regimes (15, 25, and 35°C; 80% RH). Time of host-seeking was assessed using collection bottle rotators with encephalitis vector survey traps baited with dry ice, first at 3h intervals during a full day, followed by 1h intervals during the night-time. Malathion became less toxic at cooler temperatures at all doses, while deltamethrin was less toxic at cooler temperatures at the low dose. Regarding time of host-seeking,Cx.tarsalis,Aedes vexans, andCulex quinquefasciatuswere the most abundant vectors captured. During the 3-hour interval surveillance over a full day,Cx.tarsaliswere most-active during post-midnight biting (00:00–6:00), accounting for 69.0% of allCx.tarsalis, while pre-midnight biting (18:00–24:00) accounted for 30.0% ofCx.tarsalis. During the 1-hour interval surveillance overnight,Cx.tarsaliswere most-active during pre-midnight hours (18:00–24:00), accounting for 50.2% ofCx.tarsaliscaptures, while post-midnight biting (00:00–6:00) accounted for 49.8% ofCx.tarsalis. Our results suggest that programs employing large-scale applications of insecticidal fogging should consider temperature-toxicity relationships coupled with time of host-seeking data to maximize the efficacy of vector control interventions in reducing mosquito-borne disease burden. 
    more » « less
  3. This dataset comprises daily images positioned to view streams above the weirs at Hubbard Brook for watersheds 1, 2, 3, 4, 5, 6, and 9. Cameras are programmed to take one image per day at ~12:00 pm ET. Each file is timestamped with the image metadata, but also within the file name, and structured to enable temporal trend analysis for end-users. The cameras used are BUSHNELL model number 119R3, and data are collected on SIM cards and manually downloaded every six months. Data gaps are minimal and generally associated with battery failures. These data are designed to capture stream dynamics over time for the purpose of visual pattern analysis, environmental monitoring, and machine learning applications. These data were gathered as part of the Hubbard Brook Ecosystem Study (HBES). The HBES is a collaborative effort at the Hubbard Brook Experimental Forest, which is operated and maintained by the USDA Forest Service, Northern Research Station. 
    more » « less
  4. Climate studies based on global climate models (GCMs) project a steady increase in annual average temperature and severe heat extremes in central North America during the mid-century and beyond. However, the agreement of observed trends with climate model trends varies substantially across the region. The present study focuses on two different locations: Des Moines, IA and Austin, TX. In Des Moines, annual extreme temperatures have not increased over the past three decades unlike the trend of regionally-downscaled GCM data for the Midwest, likely due to a “warming hole” over the area linked to agricultural factors. This warming hole effect is not evident for Austin over the same time period, where extreme temperatures have been higher than projected by regionally-downscaled climate (RDC) forecasts. In consideration of the deviation of such RDC extreme temperature forecasts from observations, this study statistically analyzes RDC data in conjunction with observational data to define for these two cities a 95% prediction interval of heat extreme values by 2040. The statistical model is constructed using a linear combination of RDC ensemble-member annual extreme temperature forecasts with regression coefficients for individual forecasts estimated by optimizing model results against observations over a 52-year training period. 
    more » « less
  5. Abstract Hierarchical probability models are being used more often than non-hierarchical deterministic process models in environmental prediction and forecasting, and Bayesian approaches to fitting such models are becoming increasingly popular. In particular, models describing ecosystem dynamics with multiple states that are autoregressive at each step in time can be treated as statistical state space models (SSMs). In this paper, we examine this subset of ecosystem models, embed a process-based ecosystem model into an SSM, and give closed form Gibbs sampling updates for latent states and process precision parameters when process and observation errors are normally distributed. Here, we use simulated data from an example model (DALECev) and study the effects changing the temporal resolution of observations on the states (observation data gaps), the temporal resolution of the state process (model time step), and the level of aggregation of observations on fluxes (measurements of transfer rates on the state process). We show that parameter estimates become unreliable as temporal gaps between observed state data increase. To improve parameter estimates, we introduce a method of tuning the time resolution of the latent states while still using higher-frequency driver information and show that this helps to improve estimates. Further, we show that data cloning is a suitable method for assessing parameter identifiability in this class of models. Overall, our study helps inform the application of state space models to ecological forecasting applications where (1) data are not available for all states and transfers at the operational time step for the ecosystem model and (2) process uncertainty estimation is desired. 
    more » « less