Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Free, publicly-accessible full text available July 8, 2023
One of the most costly factors in providing a global computing infrastructure such as the WLCG is the human effort in deployment, integration, and operation of the distributed services supporting collaborative computing, data sharing and delivery, and analysis of extreme scale datasets. Furthermore, the time required to roll out global software updates, introduce new service components, or prototype novel systems requiring coordinated deployments across multiple facilities is often increased by communication latencies, staff availability, and in many cases expertise required for operations of bespoke services. While the WLCG (and distributed systems implemented throughout HEP) is a global service platform, it lacks the capability and flexibility of a modern platform-as-a-service including continuous integration/continuous delivery (CI/CD) methods, development-operations capabilities (DevOps, where developers assume a more direct role in the actual production infrastructure), and automation. Most importantly, tooling which reduces required training, bespoke service expertise, and the operational effort throughout the infrastructure, most notably at the resource endpoints (sites), is entirely absent in the current model. In this paper, we explore ideas and questions around potential NoOps models in this context: what is realistic given organizational policies and constraints? How should operational responsibility be organized across teams and facilities? What are the technicalmore »
Doglioni, C. ; Kim, D. ; Stewart, G.A. ; Silvestris, L. ; Jackson, P. ; Kamleh, W. (Ed.)The Scalable Systems Laboratory (SSL), part of the IRIS-HEP Software Institute, provides Institute participants and HEP software developers generally with a means to transition their R&D from conceptual toys to testbeds to production-scale prototypes. The SSL enables tooling, infrastructure, and services supporting innovation of novel analysis and data architectures, development of software elements and tool-chains, reproducible functional and scalability testing of service components, and foundational systems R&D for accelerated services developed by the Institute. The SSL is constructed with a core team having expertise in scale testing and deployment of services across a wide range of cyberinfrastructure. The core team embeds and partners with other areas in the Institute, and with LHC and other HEP development and operations teams as appropriate, to define investigations and required service deployment patterns. We describe the approach and experiences with early application deployments, including analysis platforms and intelligent data delivery systems.
Doglioni, C. ; Kim, D. ; Stewart, G.A. ; Silvestris, L. ; Jackson, P. ; Kamleh, W. (Ed.)In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.