NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Harnessing HPC resources for CMS jobs using a Virtual Private Network

https://doi.org/10.1051/epjconf/202125102032

Tovar, Benjamin; Bockelman, Brian; Hildreth, Michael; Lannon, Kevin; Thain, Douglas (January 2021, EPJ Web of Conferences)
Biscarat, C.; Campana, S.; Hegner, B.; Roiser, S.; Rovelli, C.I.; Stewart, G.A. (Ed.)
The processing needs for the High Luminosity (HL) upgrade for the LHC require the CMS collaboration to harness the computational power available on non-CMS resources, such as High-Performance Computing centers (HPCs). These sites often limit the external network connectivity of their computational nodes. In this paper we describe a strategy in which all network connections of CMS jobs inside a facility are routed to a single point of external network connectivity using a Virtual Private Network (VPN) server by creating virtual network interfaces in the computational nodes. We show that when the computational nodes and the host running the VPN server have the namespaces capability enabled, the setup can run entirely on user space with no other root permissions required. The VPN server host may be a privileged node inside the facility configured for outside network access, or an external service that the nodes are allowed to contact. When namespaces are not enabled at the client side, then the setup falls back to using a SOCKS server instead of virtual network interfaces. We demonstrate the strategy by executing CMS Monte Carlo production requests on opportunistic non-CMS resources at the University of Notre Dame. For these jobs, cvmfs support is tested via fusermount (cvmfsexec), and the native fuse module.
more » « less
Full Text Available
Dynamic Sizing of Continuously Divisible Jobs for Heterogeneous Resources

https://doi.org/10.1109/eScience.2019.00026

Hazekamp, Nicholas; Tovar, Benjamin; Thain, Douglas (September 2019, IEEE International Conference on e-Science)
null (Ed.)
Many scientific applications operate on large datasets that can be partitioned and operated on concurrently.The existing approaches for concurrent execution generally rely on statically partitioned data. This static partitioning can lock performance in a sub-optimal configuration, leading to higher execution time and an inability to respond to dynamic resources.We present the Continuously Divisible Job abstraction which allows statically defined applications to have their component tasks dynamically sized responding to system behaviour. The Continuously Divisible Job abstraction defines a simple interface that dictates how work can be recursively divided, executed,and merged. Implementing this abstraction allows scientific applications to leverage dynamic job coordinators for execution.We also propose the Virtual File abstraction which allows read-only subsets of large files to be treated as separate files.In exploring the Continuously Divisible Job abstraction, two applications were implemented using the Continuously Divisible Job interface: a bioinformatics application and a high-energy physics event analysis. These were tested using an abstract job interface and several job coordinators. Comparing these against a previous static partitioning implementation we show comparable or better performance without having to make static decisions or implement complex dynamic application handling.
more » « less
Full Text Available
A Lightweight Model for Right-Sizing Master-Worker Applications

https://doi.org/10.1109/SC.2018.00042

Kremer-Herman, Nathaniel; Tovar, Benjamin; Thain, Douglas (November 2018, International Conference for High Performance Computing, Networking, Storage and Analysis)

Full Text Available
Automatic Dependency Management for Scientific Applications on Clusters

https://doi.org/10.1109/IC2E.2018.00026

Tovar, Benjamin; Hazekamp, Nicholas; Kremer-Herman, Nathaniel; Thain, Douglas (April 2018, IEEE International Conference on Cloud Engineering (IC2E))

Full Text Available
Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows

https://doi.org/10.1109/TPDS.2017.2764897

Hazekamp, Nicholas; Kremer-Herman, Nathaniel; Tovar, Benjamin; Meng, Haiyan; Choudhury, Olivia; Emrich, Scott; Thain, Douglas (October 2017, IEEE Transactions on Parallel and Distributed Systems)

Workflow management systems are widely used to express and execute highly parallel applications. For dataintensive workflows, storage can be the constraining resource: the number of tasks running at once must be artificially limited to not overflow the space available in the filesystem. It is all too easy for a user to dispatch a workflow which consumes all available storage and disrupts all system users. To address these issues, we present a three-tiered approach to workflow storage management: (1) A static analysis algorithm which analyzes the storage needs of a workflow before execution, giving a realistic prediction of success or failure. (2) An online storage management algorithm which accounts for the storage needed by future tasks to avoid deadlock at runtime. (3) A task containment system which limits storage consumption of individual tasks, enabling the strong guarantees of the static analysis and dynamic management algorithms. We demonstrate the application of these techniques on three complex workflows.
more » « less
Full Text Available
Scaling up a CMS tier-3 site with campus resources and a 100 Gb/s network connection: what could go wrong?

https://doi.org/10.1088/1742-6596/898/8/082041

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Anampa, Kenyi Hurtado; Tovar, Benjamin; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas (October 2017, Journal of Physics: Conference Series)

Full Text Available
Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores

https://doi.org/10.1088/1742-6596/898/5/052036

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Anampa, Kenyi Hurtado; Yannakopoulos, Anna; Tovar, Benjamin; Donnelly, Patrick; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; et al (October 2017, Journal of Physics: Conference Series)

Full Text Available

Search for: All records