skip to main content


Title: In‐memory staging and data‐centric task placement for coupled scientific simulation workflows
Summary

Coupled scientific simulation workflows are composed of heterogeneous component applications that simulate different aspects of the physical phenomena being modeled and that interact and exchange significant volumes of data at runtime. As the data volumes and generation rates keep growing, the traditional disk I/O–based data movement approach becomes cost prohibitive, and workflow requires more scalable and efficient approach to support the data movement. Moreover, the cost of moving large volume of data over system interconnection network becomes dominating and significantly impacts the workflow execution time. Minimize the amount of network data movement and localize data transfers are critical for reducing such cost. To achieve this, workflow task placement should exploit data locality to the extent possible and move computation closer to data. In this paper, we investigate applying in‐memory data staging and data‐centric task placement to reduce the data movement cost in large‐scale coupled simulation workflows. Specifically, we present a distributed data sharing and task execution framework that (1) co‐locates in‐memory data staging on application compute nodes to store data that needs to be shared or exchanged and (2) uses data‐centric task placement to map computations onto processor cores that a large portion of the data exchanges can be performed using the intra‐node shared memory. We also present the implementation of the framework and its experimental evaluation on Titan Cray XK7 petascale supercomputer.

 
more » « less
NSF-PAR ID:
10035740
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Concurrency and Computation: Practice and Experience
Volume:
29
Issue:
12
ISSN:
1532-0626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Scientific breakthroughs in biomolecular methods and improvements in hardware technology have shifted from a single long-running simulation to a large set of shorter simulations running simultaneously, called an ensemble. In an ensemble, each independent simulation is usually coupled with several analyses that apply identical or distinct algorithms on data produced by the corresponding simulation. Today, In situ methods are used to analyze large volumes of data generated by scientific simulations at runtime. This work studies the execution of ensemble-based simulations paired with In situ analyses using in-memory staging methods. Because simulations and analyses forming an ensemble typically run concurrently, deploying an ensemble requires efficient co-location-aware strategies, making sure the data flow between simulations and analyses that form an In situ workflow is efficient. Using an ensemble of molecular dynamics In situ workflows with multiple simulations and analyses, we first show that collecting traditional metrics such as makespan, instructions per cycle, memory usage, or cache miss ratio is not sufficient to characterize the complex behaviors of ensembles. Thus, we propose a method to evaluate the performance of ensembles of workflows that captures resource usage (efficiency), resource allocation, and component placement. Experimental results demonstrate that our proposed method can effectively capture the performance of different component placements in an ensemble. By evaluating different co-location scenarios, our performance indicator demonstrates improvements of up to four orders of magnitude when co-locating simulation and coupled analyses within a single computational host. 
    more » « less
  2. Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex natural and engineered phenomena. However, the increasing complexity necessitates managing, transporting, and processing unprecedented amounts of data, and as a result, researchers are increasingly exploring data-staging and in-situ workflows to reduce data movement and data-related overheads. However, as these workflows become more dynamic in their structures and behaviors, data staging and in-situ solutions must evolve to support new requirements. In this paper, we explore how the service-oriented concept can be applied to extreme-scale in-situ workflows. Specifically, we explore persistent data staging as a service and present the design and implementation of DataSpaces as a Service, a service-oriented data staging framework. We use a dynamically coupled fusion simulation workflow to illustrate the capabilities of this framework and evaluate its performance and scalability. 
    more » « less
  3. Scientific breakthroughs in biomolecular methods and improvements in hardware technology have shifted from a long-running simulation to a large set of shorter simulations running simultaneously, called an ensemble. In an ensemble, simulations are usually coupled with analyses of data produced by the simulations. In situ methods can be used to analyze large volumes of data generated by scientific simulations at runtime (i.e., simulations and analyses are performed concurrently). In this work, we study the execution of ensemble-based simulations paired with in situ analyses using in-memory staging methods. Using an ensemble of molecular dynamics in situ workflows with multiple simulations and analyses, we first show that collecting traditional metrics such as makespan, instructions per cycle, memory usage, or cache miss ratio is not sufficient to characterize complex behaviors of ensembles. We propose a method to evaluate the performance of ensembles of workflows that captures multiple resource usage aspects: resource efficiency, resource allocation, and resource provisioning. Experimental results demonstrate that the proposed method can effectively distinguish the performance of different component placements in an ensemble with up to 32 ensemble members. By evaluating different co-location scenarios, our proposed performance indicators demonstrate benefits of co-locating simulation and coupled analyses within a compute node. 
    more » « less
  4. Scientific workflows drive most modern large-scale science breakthroughs by allowing scientists to define their computations as a set of jobs executed in a given order based on their data dependencies. Workflow management systems (WMSs) have become key to automating scientific workflows-executing computational jobs and orchestrating data transfers between those jobs running on complex high-performance computing (HPC) platforms. Traditionally, WMSs use files to communicate between jobs: a job writes out files that are read by other jobs. However, HPC machines face a growing gap between their storage and compute capabilities. To address that concern, the scientific community has adopted a new approach called in situ, which bypasses costly parallel filesystem I/O operations with faster in-memory or in-network communications. When using in situ approaches, communication and computations can be interleaved. In this work, we leverage the Decaf in situ dataflow framework to accelerate task-based scientific workflows managed by the Pegasus WMS, by replacing file communications with faster MPI messaging. We propose a new execution engine that uses Decaf to manage communications within a sub-workflow (i.e., set of jobs) to optimize inter-job communications. We consider two workflows in this study: (i) a synthetic workflow that benchmarks and compares file- and MPI-based communication; and (ii) a realistic bioinformatics workflow that computes mu-tational overlaps in the human genome. Experiments show that in situ communication can improve the bioinformatics workflow execution time by 22% to 30% compared with file communication. Our results motivate further opportunities and challenges for bridging traditional WMSs with in situ frameworks. 
    more » « less
  5. Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation. 
    more » « less