NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Open Science Data Federation - operation and monitoring

https://doi.org/10.1145/3626203.3670557

Andrijauskas, Fabio; Weitzel, Derek; Wuerthwein, Frank (July 2024, ACM)

Full Text Available
Integrating End-to-End Exascale SDN into the LHC Data Distribution Cyberinfrastructure

https://doi.org/10.1145/3491418.3535134

Guiang, Jonathan; Arora, Aashay; Davila, Diego; Graham, John; Mishin, Dima; Sfiligoi, Igor; Wuerthwein, Frank; Lehman, Tom; Yang, Xi; Guok, Chin; et al (July 2022, Proceedings of PEARC '22: Practice and Experience in Advanced Research Computing)

Full Text Available
Characterizing network paths in and out of the clouds

https://doi.org/10.1051/epjconf/202024507059

Sfiligoi, Igor; Graham, John; Wuerthwein, Frank (January 2020, EPJ Web of Conferences)
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a characterization of networking in various regions of Amazon Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. The paper contains both a qualitative analysis of the results as well as latency and peak throughput measurements. It also includes an analysis of the costs involved with Cloud-based networking.
more » « less
Full Text Available
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing: Producing a fp32 ExaFLOP hour worth of IceCube simulation data in a single workday

https://doi.org/10.1145/3311790.3396625

Sfiligoi, Igor; Schultz, David; Riedel, Benedikt; Wuerthwein, Frank; Barnet, Steve; Brik, Vladimir (July 2020, PEARC '20: Practice and Experience in Advanced Research Computing)
null (Ed.)
Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
more » « less
Full Text Available
The Scalable Systems Laboratory: a Platform for Software Innovation for HEP

https://doi.org/10.1051/epjconf/202024505019

Gardner, Robert; Bryant, Lincoln; Neubauer, Mark; Wuerthwein, Frank; Stephen, Judith; Chien, Andrew (January 2020, EPJ Web of Conferences)
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
The Scalable Systems Laboratory (SSL), part of the IRIS-HEP Software Institute, provides Institute participants and HEP software developers generally with a means to transition their R&D from conceptual toys to testbeds to production-scale prototypes. The SSL enables tooling, infrastructure, and services supporting innovation of novel analysis and data architectures, development of software elements and tool-chains, reproducible functional and scalability testing of service components, and foundational systems R&D for accelerated services developed by the Institute. The SSL is constructed with a core team having expertise in scale testing and deployment of services across a wide range of cyberinfrastructure. The core team embeds and partners with other areas in the Institute, and with LHC and other HEP development and operations teams as appropriate, to define investigations and required service deployment patterns. We describe the approach and experiences with early application deployments, including analysis platforms and intelligent data delivery systems.
more » « less
Full Text Available
Distributed Computing Software and Data Access Patterns in OSG Midscale Collaborations

https://doi.org/10.1051/epjconf/202024503005

Paschos, Pascal; Riedel, Benedikt; Rynge, Mats; Bryant, Lincoln; Stephen, Judith; Gardner, Robert; Fajardo, Edgar; Hicks, John; Wuerthwein, Frank; Clark, James (January 2020, EPJ Web of Conferences)
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.
more » « less
Full Text Available
A Roadmap for HEP Software and Computing R&D for the 2020s

https://doi.org/10.1007/s41781-018-0018-8

Albrecht, Johannes; Alves, Antonio Augusto; Amadio, Guilherme; Andronico, Giuseppe; Anh-Ky, Nguyen; Aphecetche, Laurent; Apostolakis, John; Asai, Makoto; Atzori, Luca; Babik, Marian; et al (December 2019, Computing and Software for Big Science)

Full Text Available

Search for: All records