skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 22, 2026

Title: Scalable Climate Data Analysis: Balancing Petascale Fidelity and Computational Cost in Proceedings of 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)
The growing resolution and volume of climate data from remote sensing and simulations pose significant storage, processing, and computational challenges. Traditional compression or subsampling methods often compromise data fidelity, limiting scientific insights. We introduce a scalable ecosystem that integrates hierarchical multiresolution data management, intelligent transmission, and ML-assisted reconstruction to balance accuracy and efficiency. Our approach reduces storage and computational costs by 99%, lowering expenses from $100,000 to $24 while maintaining a Root Mean Square (RMS) error of 1.46 degrees Celsius. Our experimental results confirm that even with significant data reduction, essential features required for accurate climate analysis are preserved. Validated on petascale NASA climate datasets, this solution enables cost-effective, high-fidelity climate analysis for research and decision-making  more » « less
Award ID(s):
2138811 2127548 1941085
PAR ID:
10638534
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE Computer Society
Date Published:
Page Range / eLocation ID:
245-248
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Identifying controls on soil organic carbon (SOC) storage, and where SOC is most vulnerable to loss, are essential to managing soils for both climate change mitigation and global food security. However, we currently lack a comprehensive understanding of the global drivers of SOC storage, especially with regards to particulate (POC) and mineral‐associated organic carbon (MAOC). To better understand hierarchical controls on POC and MAOC, we applied path analyses to SOC fractions, climate (i.e., mean annual temperature [MAT] and mean annual precipitation minus potential evapotranspiration [MAP‐PET]), carbon (C) input (i.e., net primary production [NPP]), and soil property data synthesized from 72 published studies, along with data we generated from the National Ecological Observatory Network soil pits (n = 901 total observations). To assess the utility of investigating POC and MAOC separately in understanding SOC storage controls, we then compared these results with another path analysis predicting bulk SOC storage. We found that POC storage is negatively related to MAT and soil pH, while MAOC storage is positively related to NPP and MAP‐PET, but negatively related to soil % sand. Our path analysis predicting bulk SOC revealed similar trends but explained less variation in C storage than our POC and MAOC analyses. Given that temperature and pH impose constraints on microbial decomposition, this indicates that POC is primarily controlled by SOC loss processes. In contrast, strong relationships with variables related to plant productivity constraints, moisture, and mineral surface availability for sorption indicate that MAOC is primarily controlled by climate‐driven variations in C inputs to the soil, as well as C stabilization mechanisms. Altogether, these results demonstrate that global POC and MAOC storage are controlled by separate environmental variables, further justifying the need to quantify and model these C fractions separately to assess and forecast the responses of SOC storage to global change. 
    more » « less
  2. null (Ed.)
    High-fidelity blood flow modelling is crucial for enhancing our understanding of cardiovascular disease. Despite significant advances in computational and experimental characterization of blood flow, the knowledge that we can acquire from such investigations remains limited by the presence of uncertainty in parameters, low resolution, and measurement noise. Additionally, extracting useful information from these datasets is challenging. Data-driven modelling techniques have the potential to overcome these challenges and transform cardiovascular flow modelling. Here, we review several data-driven modelling techniques, highlight the common ideas and principles that emerge across numerous such techniques, and provide illustrative examples of how they could be used in the context of cardiovascular fluid mechanics. In particular, we discuss principal component analysis (PCA), robust PCA, compressed sensing, the Kalman filter for data assimilation, low-rank data recovery, and several additional methods for reduced-order modelling of cardiovascular flows, including the dynamic mode decomposition and the sparse identification of nonlinear dynamics. All techniques are presented in the context of cardiovascular flows with simple examples. These data-driven modelling techniques have the potential to transform computational and experimental cardiovascular research, and we discuss challenges and opportunities in applying these techniques in the field, looking ultimately towards data-driven patient-specific blood flow modelling. 
    more » « less
  3. Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state.The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society. 
    more » « less
  4. Geologic carbon storage represents one of the few truly scalable technologies capable of reducing the CO 2 concentration in the atmosphere. While this technology has the potential to scale, its success hinges on our ability to mitigate its risks. An important aspect of risk mitigation concerns assurances that the injected CO 2 remains within the storage complex. Among the different monitoring modalities, seismic imaging stands out due to its ability to attain high-resolution and high-fidelity images. However, these superior features come at prohibitive costs and time-intensive efforts that potentially render extensive seismic monitoring undesirable. To overcome this shortcoming, we present a methodology in which time-lapse images are created by inverting nonreplicated time-lapse monitoring data jointly. By no longer insisting on replication of the surveys to obtain high-fidelity time-lapse images and differences, extreme costs and time-consuming labor are averted. To demonstrate our approach, hundreds of realistic synthetic noisy time-lapse seismic data sets are simulated that contain imprints of regular CO 2 plumes and irregular plumes that leak. These time-lapse data sets are subsequently inverted to produce time-lapse difference images that are used to train a deep neural classifier. The testing results show that the classifier is capable of detecting CO 2 leakage automatically on unseen data with reasonable accuracy. We consider the use of this classifier as a first step in the development of an automatic workflow designed to handle the large number of continuously monitored CO 2 injection sites needed to help combat climate change. 
    more » « less
  5. Abstract For evaluating the climatic and landscape controls on long‐term baseflow, baseflow index (BFI, defined as the ratio of baseflow to streamflow) and baseflow coefficient (BFC, defined as the ratio of baseflow to precipitation) are formulated as functions of climate aridity index, storage capacity index (defined as the ratio of average soil water storage capacity to precipitation), and a shape parameter for the spatial variability of storage capacity. The derivation is based on the two‐stage partitioning framework and a cumulative distribution function for storage capacity. Storage capacity has a larger impact on BFI than on BFC. When storage capacity index is smaller than 1, BFI is less sensitive to storage capacity index in arid regions compared to that in humid regions; whereas, when storage capacity index is larger than 1, BFI is less sensitive to storage capacity index in humid regions. The impact of storage capacity index on BFC is only significant in humid regions. The shape parameter plays an important role on fast flow generation at the first‐stage partitioning in humid regions and baseflow generation at the second‐stage partitioning in arid regions. The derived formulae were applied to more than 400 catchments where storage capacity index was found to follow a logarithmic function with climate aridity index. The role of climate forcings at finer timescales on baseflow were quantified, indicating that seasonality in climate forcings has a significant control especially on BFI. 
    more » « less