Scientific breakthroughs in biomolecular methods and improvements in hardware technology have shifted from a single long-running simulation to a large set of shorter simulations running simultaneously, called an ensemble. In an ensemble, each independent simulation is usually coupled with several analyses that apply identical or distinct algorithms on data produced by the corresponding simulation. Today, In situ methods are used to analyze large volumes of data generated by scientific simulations at runtime. This work studies the execution of ensemble-based simulations paired with In situ analyses using in-memory staging methods. Because simulations and analyses forming an ensemble typically run concurrently, deploying an ensemble requires efficient co-location-aware strategies, making sure the data flow between simulations and analyses that form an In situ workflow is efficient. Using an ensemble of molecular dynamics In situ workflows with multiple simulations and analyses, we first show that collecting traditional metrics such as makespan, instructions per cycle, memory usage, or cache miss ratio is not sufficient to characterize the complex behaviors of ensembles. Thus, we propose a method to evaluate the performance of ensembles of workflows that captures resource usage (efficiency), resource allocation, and component placement. Experimental results demonstrate that our proposed method can effectively capture the performance of different component placements in an ensemble. By evaluating different co-location scenarios, our performance indicator demonstrates improvements of up to four orders of magnitude when co-locating simulation and coupled analyses within a single computational host.
more »
« less
Datastorm-FE: a data- and decision-flow and coordination engine for coupled simulation ensembles
Data- and model-driven computer simulations are increasingly critical in many application domains. Yet, several critical data challenges remain in obtaining and leveraging simulations in decision making. Simulations may track 100s of parameters, spanning multiple layers and spatial-temporal frames, affected by complex inter-dependent dynamic processes. Moreover, due to the large numbers of unknowns, decision makers usually need to generate ensembles of stochastic realizations, requiring 10s-1000s of individual simulation instances. The situation on the ground evolves unpredictably, requiring continuously adaptive simulation ensembles. We introduce the DataStorm framework for simulation ensemble management, and demonstrate its DataStorm-FE data- and decision-flow and coordination engine for creating and maintaining coupled, multi-model simulation ensembles. DataStorm-FE enables end-to-end ensemble planning and optimization, including parameter-space sampling, output aggregation and alignment, and state and provenance data management, to improve the overall simulation process. It also aims to work efficiently, producing results while working within a limited simulation budget, and incorporates a multivariate, spatiotemporal data browser to empower decision-making based on these improved results.
more »
« less
- Award ID(s):
- 1633381
- PAR ID:
- 10482654
- Publisher / Repository:
- VLDB Endowment
- Date Published:
- Journal Name:
- Proceedings of the VLDB Endowment
- Volume:
- 11
- Issue:
- 12
- ISSN:
- 2150-8097
- Page Range / eLocation ID:
- 1906 to 1909
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The formation of biomolecular materials via dynamical interfacial processes, such as self-assembly and fusion, for diverse compositions and external conditions can be efficiently probed using ensemble Molecular Dynamics (MD). However, this approach requires many simulations when investigating a large composition phase space. In addition, there is difficulty in predicting whether each simulation will yield biomolecular materials with the desired properties or outcomes and how long each simulation will run. These difficulties can be overcome by rules-based management systems, including intermittent inspection, variable sampling, and premature termination or extension of the individual MD simulations. Automating such a management system can significantly improve runtime efficiency and reduce the burden of organizing large ensembles of MD simulations. To this end, a computational framework, the Pipelines for Automating Compliance-based Elimination and Extension (PACE2), is proposed for high-throughput ensemble biomolecular materials simulations. The PACE2framework encompasses Candidate pipelines, where each pipeline includes temporally separated simulation and analysis tasks. When a MD simulation is completed, an analysis task is triggered, which evaluates the MD trajectory for compliance. Compliant simulations are extended to the next MD phase with a suitable sample rate to allow additional, detailed analysis. Non-compliant simulations are eliminated, and their computational resources are reallocated or released. The framework is designed to run on local desktop computers and high-performance computing resources. Preliminary scientific results enabled by the use of PACE2framework are presented, which demonstrate its potential and validates its function. In the future, the framework will be extended to address generalized workflows and investigate composition-structure-property relations for other classes of materials.more » « less
-
Abstract Forecast informed reservoir operations (FIRO) is an important advance in water management, but the design and testing of FIRO policies is limited by relatively short (10–35 year) hydro‐meteorological hindcasts. We present a novel, multisite model for synthetic forecast ensembles to overcome this limitation. This model utilizes parametric and non‐parametric procedures to capture complex forecast errors and maintain correlation between variables, lead times, locations, and ensemble members. After being fit to data from the hindcast period, this model can generate synthetic forecast ensembles in any period with observations. We demonstrate the approach in a case study of the FIRO‐based Ensemble Forecast Operations (EFO) control policy for the Lake Mendocino—Russian River basin, which conditions release decisions on ensemble forecasts from the Hydrologic Ensemble Forecast System (HEFS). We explore two generation strategies: (a) simulation of synthetic forecasts of meteorology to force HEFS; and (b) simulation of synthetic HEFS streamflow forecasts directly. We evaluate the synthetic forecasts using ensemble verification techniques and event‐based validation, finding good agreement with the actual ensemble forecasts. We then evaluate EFO policy performance using synthetic and actual forecasts over the hindcast period (1985–2010) and synthetic forecasts only over the pre‐hindcast period (1948–1984). Results show that the synthetic forecasts highlight important failure modes of the EFO policy under plausible forecast ensembles, but improvements are still needed to fully capture FIRO policy behavior under the actual forecast ensembles. Overall, the methodology advances a novel way to test FIRO policy robustness, which is key to building institutional support for FIRO.more » « less
-
Fire models predict fire behavior and effects. However, there is a need to know how confident users can be in forecasts. This work developed a probabilistic methodology based on ensemble simulations that incorporated uncertainty in weather, fuel loading, and model physics parameters. It provided information on the most likely forecast scenario, confidence levels, and potential outliers. It also introduced novel ways to communicate uncertainty in calculation and graphical representation and applied this to diverse wildfires using ensemble simulations of the CAWFE coupled weather–fire model ranging from 12 to 26 members. The ensembles captured many features but spread was narrower than expected, especially with varying weather and fuel inputs, suggesting errors may not be easily mitigated by improving input data. Varying physics parameters created a wider spread, including identifying an outlier, underscoring modeling knowledge gaps. Uncertainty was communicated using burn probability, spread rate, and heat flux, a fire intensity metric related to burn severity. Despite limited ensemble spread, maps of mean and standard deviation exposed event times and locations where fire behavior was more uncertain, requiring more management or observations. Interpretability was enhanced by replacing traditional hot–cold color palettes with ones that accommodate the vision-impaired and adhere to web accessibility standards.more » « less
-
Abstract An ensemble postprocessing method is developed to improve the probabilistic forecasts of extreme precipitation events across the conterminous United States (CONUS). The method combines a 3D vision transformer (ViT) for bias correction with a latent diffusion model (LDM), a generative artificial intelligence (AI) method, to postprocess 6-hourly precipitation ensemble forecasts and produce an enlarged generative ensemble that contains spatiotemporally consistent precipitation trajectories. These trajectories are expected to improve the characterization of extreme precipitation events and offer skillful multiday accumulated and 6-hourly precipitation guidance. The method is tested using the Global Ensemble Forecast System (GEFS) precipitation forecasts out to day 6 and is verified against the Climatology-Calibrated Precipitation Analysis (CCPA) data. Verification results indicate that the method generated skillful ensemble members with improved continuous ranked probabilistic skill scores (CRPSSs) and Brier skill scores (BSSs) over the raw operational GEFS and a multivariate statistical postprocessing baseline. It showed skillful and reliable probabilities for events at extreme precipitation thresholds. Explainability studies were further conducted, which revealed the decision-making process of the method and confirmed its effectiveness on ensemble member generation. This work introduces a novel, generative AI–based approach to address the limitation of small numerical ensembles and the need for larger ensembles to identify extreme precipitation events. Significance StatementWe use a new artificial intelligence (AI) technique to improve extreme precipitation forecasts from a numerical weather prediction ensemble, generating more scenarios that better characterize extreme precipitation events. This AI-generated ensemble improved the accuracy of precipitation forecasts and probabilistic warnings for extreme precipitation events. The study explores AI methods to generate precipitation forecasts and explains the decision-making mechanisms of such AI techniques to prove their effectiveness.more » « less
An official website of the United States government

