NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pushing the Cloud Limits in Support of IceCube Science

https://doi.org/10.1109/MIC.2020.3045209

Sfiligoi, Igor; Schultz, David; Wurthwein, Frank; Riedel, Benedikt; Deelman, Ewa (January 2021, IEEE Internet Computing)
null (Ed.)
Full Text Available
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing: Producing a fp32 ExaFLOP hour worth of IceCube simulation data in a single workday

https://doi.org/10.1145/3311790.3396625

Sfiligoi, Igor; Schultz, David; Riedel, Benedikt; Wuerthwein, Frank; Barnet, Steve; Brik, Vladimir (July 2020, PEARC '20: Practice and Experience in Advanced Research Computing)
null (Ed.)
Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
more » « less
Full Text Available
Demonstrating 100 Gbps in and out of the public Clouds

https://doi.org/10.1145/3311790.3399612

Sfiligoi, Igor (July 2020, Practice and Experience in Advanced Research Computing (PEARC20))
null (Ed.)
Full Text Available
Characterizing network paths in and out of the clouds

https://doi.org/10.1051/epjconf/202024507059

Sfiligoi, Igor; Graham, John; Wuerthwein, Frank (January 2020, EPJ Web of Conferences)
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a characterization of networking in various regions of Amazon Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. The paper contains both a qualitative analysis of the results as well as latency and peak throughput measurements. It also includes an analysis of the costs involved with Cloud-based networking.
more » « less
Full Text Available
Creating a content delivery network for general science on the internet backbone using XCaches

https://doi.org/10.1051/epjconf/202024504041

Fajardo, Edgar; Weitzel, Derek; Rynge, Mats; Zvada, Marian; Hicks, John; Selmeci, Mat; Lin, Brian; Paschos, Pascal; Bockelman, Brian; Hanushevsky, Andrew; et al (January 2020, EPJ Web of Conferences)
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
A general problem faced by opportunistic users computing on the grid is that delivering cycles is simpler than delivering data to those cycles. In this project XRootD caches are placed on the internet backbone to create a content delivery network. Scientific workflows in the domains of high energy physics, gravitational waves, and others profit from this delivery network to increases CPU efficiency while decreasing network bandwidth use.
more » « less
Full Text Available
StashCache: A Distributed Caching Federation for the Open Science Grid

https://doi.org/10.1145/3332186.3332212

Weitzel, Derek; Zvada, Marian; Vukotic, Ilija; Gardner, Rob; Bockelman, Brian; Rynge, Mats; Hernandez, Edgar Fajardo; Lin, Brian; Selmeci, Mátyás (January 2019, Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) (PEARC ‘19). ACM, New York, NY, USA, Article 58, 7 pages.)

Data distribution for opportunistic users is challenging as they neither own the computing resources they are using or any nearby storage. Users are motivated to use opportunistic computing to expand their data processing capacity, but they require storage and fast networking to distribute data to that processing. Since it requires significant management overhead, it is rare for resource providers to allow opportunistic access to storage. Additionally, in order to use opportunistic storage at several distributed sites, users assume the responsibility to maintain their data. In this paper we present StashCache, a distributed caching federation that enables opportunistic users to utilize nearby opportunistic storage. StashCache is comprised of four components: data origins, redirectors, caches, and clients. StashCache has been deployed in the Open Science Grid for several years and has been used by many projects. Caches are deployed in geographically distributed locations across the U.S. and Europe. We will present the architecture of StashCache, as well as utilization information of the infrastructure. We will also present performance analysis comparing distributed HTTP Proxies vs StashCache.
more » « less
Full Text Available

Search for: All records