skip to main content


Title: Evaluation of Docker Containers for Scientific Workloads in theCloud
The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environ- ments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mecha- nism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal re- source management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container tech- nologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that sci- entific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse- grained resource allocation.  more » « less
Award ID(s):
1740263
NSF-PAR ID:
10069272
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Practice and Experience in Advanced Research Computing (PEARC), 2018.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Large-scale, high-throughput computational science faces an accelerating convergence of software and hardware. Software container-based solutions have become common in cloud-based datacenter environments, and are considered promising tools for addressing heterogeneity and portability concerns. However, container solutions reflect a set of assumptions which complicate their adoption by developers and users of scientific workflow applications. Nor are containers a universal solution for deployment in high-performance computing (HPC) environments which have specialized and vertically integrated scheduling and runtime software stacks. In this paper, we present a container design and deployment approach which uses modular layering to ease the deployment of containers into existing HPC environments. This layered approach allows operating system integrations, support for different communication and performance monitoring libraries, and application code to be defined and interchanged in isolation. We describe in this paper the details of our approach, including specifics about container deployment and orchestration for different HPC scheduling systems. We also describe how this layering method can be used to build containers for two separate applications, each deployed on clusters with different batch schedulers, MPI networking support, and performance monitoring requirements. Our experience indicates that the layered approach is a viable strategy for building applications intended to provide similar behavior across widely varying deployment targets. 
    more » « less
  2. null (Ed.)
    The first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish another major goal, supporting modern search and browse capabilities for a large collection of tweets from the Twitter social media platform, web pages, and electronic theses and dissertations (ETDs). The backbone of the information system is a Docker container cluster running with Rancher and Kubernetes. Information retrieval and visualization is accomplished with containers in a pipelined fashion, whether in the cluster or on virtual machines, for Elasticsearch and Kibana, respectively. In addition to traditional searching and browsing, the system supports full-text and metadata searching. Search results include facets as a modern means of browsing among related documents. The system supports text analysis and machine learning to reveal new properties of collection data. These new properties assist in the generation of available facets. Recommendations are also presented with search results based on associations among documents and with logged user activity. The information system is co-designed by five teams of Virginia Tech graduate students, all members of the same computer science class, CS 5604. Although the project is an academic exercise, it is the practice of the teams to work and interact as though they are groups within a company developing a product. The teams on this project include three collection management groups -- Electronic Theses and Dissertations (ETD), Tweets (TWT), and Web-Pages (WP) -- as well as the Front-end (FE) group and the Integration (INT) group to help provide the overarching structure for the application. This submission focuses on the work of the Integration (INT) team, which creates and administers Docker containers for each team in addition to administering the cluster infrastructure. Each container is a customized application environment that is specific to the needs of the corresponding team. Each team will have several of these containers set up in a pipeline formation to allow scaling and extension of the current system. The INT team also contributes to a cross-team effort for exploring the use of Elasticsearch and its internally associated database. The INT team administers the integration of the Ceph data storage system into the CS Department Cloud and provides support for interactions between containers and the Ceph filesystem. During formative stages of development, the INT team also has a role in guiding team evaluations of prospective container components and workflows. The INT team is responsible for the overall project architecture and facilitating the tools and tutorials that assist the other teams in deploying containers in a development environment according to mutual specifications agreed upon with each team. The INT team maintains the status of the Kubernetes cluster, deploying new containers and pods as needed by the collection management teams as they expand their workflows. This team is responsible for utilizing a continuous integration process to update existing containers. During the development stage the INT team collaborates specifically with the collection management teams to create the pipeline for the ingestion and processing of new collection documents, crossing services between those teams as needed. The INT team develops a reasoner engine to construct workflows with information goal as input, which are then programmatically authored, scheduled, and monitored using Apache Airflow. The INT team is responsible for the flow, management, and logging of system performance data and making any adjustments necessary based on the analysis of testing results. The INT team has established a Gitlab repository for archival code related to the entire project and has provided the other groups with the documentation to deposit their code in the repository. This repository will be expanded using Gitlab CI in order to provide continuous integration and testing once it is available. Finally, the INT team will provide a production distribution that includes all embedded Docker containers and sub-embedded Git source code repositories. The INT team will archive this distribution on the Virginia Tech Docker Container Registry and deploy it on the Virginia Tech CS Cloud. The INT-2020 team owes a sincere debt of gratitude to the work of the INT-2019 team. This is a very large undertaking and the wrangling of all of the products and processes would not have been possible without their guidance in both direct and written form. We have relied heavily on the foundation they and their predecessors have provided for us. We continue their work with systematic improvements, but also want to acknowledge their efforts Ibid. Without them, our progress to date would not have been possible. 
    more » « less
  3. Server systems with large amounts of physical memory can benefit from using some of the available memory capacity for in-memory snapshots of the ongoing computations. In-memory snapshots are useful for services such as scaling of new workload instances, debugging, during scheduling, etc., which do not require snapshot persistence across node crashes/reboots. Since increasingly more frequently servers run containerized workloads, using technologies such as Docker, the snapshot, and the subsequent snapshot restore mechanisms, would be applied at granularity of containers. However, CRIU, the current approach to snapshot/restore containers, suffers from expensive filesystem write/read operations on image files containing memory pages, which dominate the runtime costs and impact the potential benefits of manipulating in-memory process state. In this paper, we demonstrate that these overheads can be eliminated by using MVAS -- kernel support for multiple independent virtual address spaces (VAS), designed specifically for machines with large memory capacities. The resulting VAS-CRIU stores application memory as a separate snapshot address space in DRAM and avoids costly file system operations. This accelerates the snapshot/restore of address spaces by two orders of magnitude, resulting in an overall reduction in snapshot time by up to 10× and restore time by up to 9×. We demonstrate the utility of VAS-CRIU for container management services such as fine-grained snapshot generation and container instance scaling. 
    more » « less
  4. Compute heterogeneity is increasingly gaining prominence in modern datacenters due to the addition of accelerators like GPUs and FPGAs. We observe that datacenter schedulers are agnostic of these emerging accelerators, especially their resource utilization footprints, and thus, not well equipped to dynamically provision them based on the application needs. We observe that the state-of-the-art datacenter schedulers fail to provide fine-grained resource guarantees for latency-sensitive tasks that are GPU-bound. Specifically for GPUs, this results in resource fragmentation and interference leading to poor utilization of allocated GPU resources. Furthermore, GPUs exhibit highly linear energy efficiency with respect to utilization and hence proactive management of these resources is essential to keep the operational costs low while ensuring the end-to-end Quality of Service (QoS) in case of user-facing queries.Towards addressing the GPU orchestration problem, we build Knots, a GPU-aware resource orchestration layer and integrate it with the Kubernetes container orchestrator to build Kube- Knots. Kube-Knots can dynamically harvest spare compute cycles through dynamic container orchestration enabling co-location of latency-critical and batch workloads together while improving the overall resource utilization. We design and evaluate two GPU-based scheduling techniques to schedule datacenter-scale workloads through Kube-Knots on a ten node GPU cluster. Our proposed Correlation Based Prediction (CBP) and Peak Prediction (PP) schemes together improves both average and 99 th percentile cluster-wide GPU utilization by up to 80% in case of HPC workloads. In addition, CBP+PP improves the average job completion times (JCT) of deep learning workloads by up to 36% when compared to state-of-the-art schedulers. This leads to 33% cluster-wide energy savings on an average for three different workloads compared to state-of-the-art GPU-agnostic schedulers. Further, the proposed PP scheduler guarantees the end-to-end QoS for latency-critical queries by reducing QoS violations by up to 53% when compared to state-of-the-art GPU schedulers. 
    more » « less
  5. null (Ed.)
    Edge and fog computing encompass a variety of technologies that are poised to enable new applications across the Internet that support data capture, storage, processing, and communication across the networking continuum. These environments pose new challenges to the design and implementation of networks-as membership can be dynamic and devices are heterogeneous, widely distributed geographically, and in proximity to end-users, as is the case with mobile and Internet-of-Things (IoT) devices. We present a demonstration of EdgeVPN.io (Evio for short), an open-source programmable, software-defined network that addresses challenges in the deployment of virtual networks spanning distributed edge and cloud resources, in particular highlighting its use in support of the Kubernetes container orchestration middleware. The demo highlights a deployment of unmodified Kubernetes middleware across a virtual cluster comprising virtual machines deployed both in cloud providers, and in distinct networks at the edge-where all nodes are assigned private IP addresses and subject to different NAT (Network Address Translation) middleboxes, connected through an Evio virtual network. The demo includes an overview of the configuration of Kubernetes and Evio nodes and the deployment of Docker-based container pods, highlighting the seamless connectivity for TCP/IP applications deployed on the pods. 
    more » « less