Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing: Producing a fp32 ExaFLOP hour worth of IceCube simulation data in a single workday

Sfiligoi, Igor; Schultz, David; Riedel, Benedikt; Wuerthwein, Frank; Barnet, Steve; Brik, Vladimir

doi:10.1145/3311790.3396625

Citation Details

Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing: Producing a fp32 ExaFLOP hour worth of IceCube simulation data in a single workday

Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise. more »

Award ID(s):: 1841479 1826967 1941481 1841530 1148698 1730158

PAR ID:: 10211900

Author(s) / Creator(s):: Sfiligoi, Igor; Schultz, David; Riedel, Benedikt; Wuerthwein, Frank; Barnet, Steve; Brik, Vladimir

Date Published:: 2020-07-21

Journal Name:: PEARC '20: Practice and Experience in Advanced Research Computing

Page Range / eLocation ID:: 85 to 90

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3311790.3396625

More Like this