NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pushing the Cloud Limits in Support of IceCube Science

https://doi.org/10.1109/MIC.2020.3045209

Sfiligoi, Igor; Schultz, David; Wurthwein, Frank; Riedel, Benedikt; Deelman, Ewa (January 2021, IEEE Internet Computing)
null (Ed.)
Full Text Available
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing: Producing a fp32 ExaFLOP hour worth of IceCube simulation data in a single workday

https://doi.org/10.1145/3311790.3396625

Sfiligoi, Igor; Schultz, David; Riedel, Benedikt; Wuerthwein, Frank; Barnet, Steve; Brik, Vladimir (July 2020, PEARC '20: Practice and Experience in Advanced Research Computing)
null (Ed.)
Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
more » « less
Full Text Available
Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation

https://doi.org/10.1007/978-3-030-50743-5_2

Sfiligoi, Igor; Würthwein, Frank; Riedel, Benedikt; Schultz, David (June 2020, ISC High Performance 2020)
Sadayappan, Ponnuswamy; Chamberlain, Bradford L.; Juckeland, Guido; Ltaief, Hatem (Ed.)
As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototyping and urgent computing. Using the elasticity of the Cloud, we have thus put in place a pre-exascale HTCondor setup for running a scientific simulation in the Cloud, with the chosen application being IceCube's photon propagation simulation. I.e. this was not a purely demonstration run, but it was also used to produce valuable and much needed scientific results for the IceCube collaboration. In order to reach the desired scale, we aggregated GPU resources across 8 GPU models from many geographic regions across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. Using this setup, we reached a peak of over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated compute of about 100k GPU hours. In this paper we provide the description of the setup, the problems that were discovered and overcome, as well as a short description of the actual science output of the exercise.
more » « less
Full Text Available
Creating a content delivery network for general science on the internet backbone using XCaches

https://doi.org/10.1051/epjconf/202024504041

Fajardo, Edgar; Weitzel, Derek; Rynge, Mats; Zvada, Marian; Hicks, John; Selmeci, Mat; Lin, Brian; Paschos, Pascal; Bockelman, Brian; Hanushevsky, Andrew; et al (January 2020, EPJ Web of Conferences)
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
A general problem faced by opportunistic users computing on the grid is that delivering cycles is simpler than delivering data to those cycles. In this project XRootD caches are placed on the internet backbone to create a content delivery network. Scientific workflows in the domains of high energy physics, gravitational waves, and others profit from this delivery network to increases CPU efficiency while decreasing network bandwidth use.
more » « less
Full Text Available
IceCube's Long Term Archive Software

https://doi.org/10.1145/3332186.3332196

Meade, P; Riedel, B; Schultz, D (July 2019, Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) - PEARC '19)

IceCube is a cubic kilometer neutrino detector located at the South Pole. It generates 1 TiB of raw data per day, which must be archived for possible retrieval years or decades later. Other low-level data products are also archived for easy retrieval in the event of a catastrophic data center failure. The Long Term Archive software is IceCube's answer to archiving this data across several computing sites.
more » « less
Full Text Available

Search for: All records