skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 15, 2025

Title: Apptainer-Based Containers for Legacy and Platform Dependent Machine Learning Applications on HPC Systems
Apptainer (Formerly known as Singularity) is a secure, portable, and easy-to-use container system that provides absolute trust and security. It is widely used across industry and academia and suitable for filling the gaps in integration between running applications on new software technologies and legacy hardware using the optimized resource utilization of CPU and memory. It runs complex applications on HPC clusters in a simple, reproducible way. In this paper we are discussing about various implementations of Artificial Intelligence and Machine learning container-based applications running on Pegasus Supercomputing Nodes using Singularity, Nextflow. It reduces configuration setup work manually by singularity applications and it increases current workflows of High-Performance Computing (HPC), High Throughput Computing (HTC) and run time performance by 3X. we also incorporated comparative based evaluation analytical results of running an application through normal LSF job with singularity container CPU, GPU utilization and its tradeoffs.  more » « less
Award ID(s):
2346318
PAR ID:
10581419
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-6248-0
Page Range / eLocation ID:
6083 to 6091
Format(s):
Medium: X
Location:
Washington, DC, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environ- ments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mecha- nism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal re- source management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container tech- nologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that sci- entific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse- grained resource allocation. 
    more » « less
  2. High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload changes, forces the schedulers to consider multiple resources (e.g., burst buffers) beyond CPUs, in decision making. In this study, we present a multi-resource scheduling scheme named BBSched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer. BBSched formulates the scheduling problem into a multi-objective optimization (MOO) problem and rapidly solves the problem using a multi-objective genetic algorithm. The multiple solutions generated by BBSched enables system managers to explore potential tradeoffs among various resources, and therefore obtains better utilization of all the resources. The trace-driven simulations with real system workloads demonstrate that BBSched improves scheduling performance by up to 41% compared to existing methods, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling. 
    more » « less
  3. On large-scale high performance computing (HPC) systems, applications are provisioned with aggregated resources to meet their peak demands for brief periods. This results in resource underutilization because application requirements vary a lot during execution. This problem is particularly pronounced for deep learning applications that are running on leadership HPC systems with a large pool of burst buffers in the form of flash or non-volatile memory (NVM) devices. In this paper, we examine the I/O patterns of deep neural networks and reveal their critical need of loading many small samples randomly for successful training. We have designed a specialized Deep Learning File System (DLFS) that provides a thin set of APIs. Particularly, we design the metadata management of DLFS through an in-memory tree-based sample directory and its file services through the user-level SPDK protocol that can disaggregate the capabilities of NVM Express (NVMe) devices to parallel training tasks. Our experimental results show that DLFS can dramatically improve the throughput of training for deep neural networks on NVMe over Fabric, compared with the kernel-based Ext4 file system. Furthermore, DLFS achieves efficient user-level storage disaggregation with very little CPU utilization. 
    more » « less
  4. High-performance computing (HPC) resources are used in a wide range of scientic and engineering calculations. These resources have high initial and running costs. Thus, their optimal performance is crucial. There are a number of strategies to ensure the optimal state. One of them is continuous performance monitoring, where a set of applications and input parameters are executed regularly to identify performance issues proactively. Some sites hesitate to use such a strategy as it takes away the CPU cycles from actual users. The goal of this work is to identify node availability, both size- and time-wise, on busy HPC systems. Such availability spots can be used to tailor test jobs to minimize user impact. Two systems were analyzed: small - 118 nodes from the Center for Computational Research at the University at Bualo and large - 1,160 nodes from the Texas Advanced Computing Center. It was found that for days with 90% utilization and above, there are plenty of opportunities for test jobs. For example, on a small cluster, 8 nodes for 30 minutes are available for an average of 2.3 hours throughout the day. That is 9.6% of the day the scheduler has the opportunity to schedule such a job. On a large system, 32 nodes for 30 minutes were available on average 9.2 hours a day (or 38% of day). Thus, there is a space for test jobs, but it is not evident that the scheduler can benet from it, and a proper strategy must be used, for example, by lowering test job priorities. 
    more » « less
  5. null (Ed.)
    Large-scale, high-throughput computational science faces an accelerating convergence of software and hardware. Software container-based solutions have become common in cloud-based datacenter environments, and are considered promising tools for addressing heterogeneity and portability concerns. However, container solutions reflect a set of assumptions which complicate their adoption by developers and users of scientific workflow applications. Nor are containers a universal solution for deployment in high-performance computing (HPC) environments which have specialized and vertically integrated scheduling and runtime software stacks. In this paper, we present a container design and deployment approach which uses modular layering to ease the deployment of containers into existing HPC environments. This layered approach allows operating system integrations, support for different communication and performance monitoring libraries, and application code to be defined and interchanged in isolation. We describe in this paper the details of our approach, including specifics about container deployment and orchestration for different HPC scheduling systems. We also describe how this layering method can be used to build containers for two separate applications, each deployed on clusters with different batch schedulers, MPI networking support, and performance monitoring requirements. Our experience indicates that the layered approach is a viable strategy for building applications intended to provide similar behavior across widely varying deployment targets. 
    more » « less