Exploitation of network-segregated CPU resources in CMS

Acosta-Silva, C.; Delgado Peris, A.; Flix, J.; Frey, J.; Hernández, J.M.; Yzquierdo, A. Pérez-Calero; Tannenbaum, T.

doi:10.1051/epjconf/202125102020

Citation Details

Exploitation of network-segregated CPU resources in CMS

CMS is tackling the exploitation of CPU resources at HPC centers where compute nodes do not have network connectivity to the Internet. Pilot agents and payload jobs need to interact with external services from the compute nodes: access to the application software (CernVM-FS) and conditions data (Frontier), management of input and output data files (data management services), and job management (HTCondor). Finding an alternative route to these services is challenging. Seamless integration in the CMS production system without causing any operational overhead is a key goal. The case of the Barcelona Supercomputing Center (BSC), in Spain, is particularly challenging, due to its especially restrictive network setup. We describe in this paper the solutions developed within CMS to overcome these restrictions, and integrate this resource in production. Singularity containers with application software releases are built and pre-placed in the HPC facility shared file system, together with conditions data files. HTCondor has been extended to relay communications between running pilot jobs and HTCondor daemons through the HPC shared file system. This operation mode also allows piping input and output data files through the HPC file system. Results, issues encountered during the integration process, and remaining concerns are discussed. more »

Award ID(s):: 2030508 1836650

PAR ID:: 10296562

Author(s) / Creator(s):: Acosta-Silva, C.; Delgado Peris, A.; Flix, J.; Frey, J.; Hernández, J.M.; Yzquierdo, A. Pérez-Calero; Tannenbaum, T.

Editor(s):: Biscarat, C.; Campana, S.; Hegner, B.; Roiser, S.; Rovelli, C.I.; Stewart, G.A.

Date Published:: 2021-01-01

Journal Name:: EPJ Web of Conferences

Volume:: 251

ISSN:: 2100-014X

Page Range / eLocation ID:: 02020

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1051/epjconf/202125102020

More Like this