Container systems (e.g., Docker) provide a well-defined, lightweight, and versatile foundation to streamline the process of tool deployment, to provide a consistent and repeatable experimental interface, and to leverage data centers in the global cloud infrastructure as measurement vantage points. However, the virtual network devices commonly used to connect containers to the Internet are known to impose latency overheads which distort the values reported by measurement tools running inside containers. In this study, we develop a tool called MACE to measure and remove the latency overhead of virtual network devices as used by Docker containers. A key insight of MACE is the fact that container functions all execute in the same kernel. Based on this insight, MACE is implemented as a Linux kernel module using the trace event subsystem to measure latency along the network stack code path. Using CloudLab, we evaluate MACE by comparing the ping measurements emitted from a slim-ping container to the ones emitted using the same tool running in the bare metal machine under varying traffic loads. Our evaluation shows that the MACE-adjusted RTT measurements are within 20 µs of the bare metal ping RTTs on average while incurring less than 25 µs RTT perturbation. We also compare RTT perturbation incurred by MACE with perturbation incurred by the built-in ftrace kernel tracing system and find that MACE incurs less perturbation.
more »
« less
This content will become publicly available on February 17, 2026
AlphaBoot: accelerated container cold start using SmartNICs
Scalability and flexibility of modern cloud application can be mainly attributed to virtual machines (VMs) and containers, where virtual machines are isolated operating systems that run on a hypervisor while containers are lightweight isolated processes that share the Host OS kernel. To achieve the scalability and flexibility required for modern cloud applications, each bare-metal server in the data center often houses multiple virtual machines, each of which runs multiple containers and multiple containerized applications that often share the same set of libraries and code, often referred to as images. However, while container frameworks are optimized for sharing images within a single VM, sharing images across multiple VMs, even if the VMs are within the same bare-metal server, is nearly non-existent due to the nature of VM isolation, leading to repetitive downloads, causing redundant added network traffic and latency. This work aims to resolve this problem by utilizing SmartNICs, which are specialized network hardware that provide hardware acceleration and offload capabilities for networking tasks, to optimize image retrieval and sharing between containers across multiple VMs on the same server. The method proposed in this work shows promise in cutting down container cold start time by up to 92%, reducing network traffic by 99.9%. Furthermore, the result is even more promising as the performance benefit is directly proportional to the number of VMs in a server that concurrently seek the same image, which guarantees increased efficiency as bare metal machine specifications improve.
more »
« less
- Award ID(s):
- 2245352
- PAR ID:
- 10639533
- Publisher / Repository:
- Frontiers
- Date Published:
- Journal Name:
- Frontiers in High Performance Computing
- Volume:
- 3
- ISSN:
- 2813-7337
- Subject(s) / Keyword(s):
- SmartNICs, cloud computing, containers, data center networks, virtual machines
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In cloud-native environments, containers are often deployed within lightweight virtual machines (VMs) to ensure strong security isolation and privacy protection. With the growing demand for customized cloud services, third-party vendors are turning to infrastructure-as-a-service (IaaS) cloud providers to build their own cloud-native platforms, necessitating the need to run a VM or a guest that hosts containers inside another VM instance leased from an IaaS cloud. State-of-the-art nested virtualization in the x86 architecture relies heavily on the host hypervisor to expose hardware virtualization support to the guest hypervisor, not only complicating cloud management but also raising concerns about an increased attack surface at the host hypervisor. This paper presents the design and implementation of PVM, a high-performance guest hypervisor for KVM that is transparent to the host hypervisor and assumes no hardware virtualization support. PVM leverages two key designs: 1) a minimal shared memory region between the guest and guest hypervisor to facilitate state transition between different privilege levels and 2) an efficient shadow page table design to reduce the cost of memory virtualization. PVM has been adopted by a major IaaS cloud provider for hosting tens of thousands of secure containers on a daily basis. Our experiments demonstrate that PVM significantly outperforms current nested virtualization in KVM for memory virtualization, particularly for concurrent workloads, while maintaining comparable performance in CPU and I/O virtualization.more » « less
-
We consider a distributed server system consisting of a large number of servers, each with limited capacity on multiple resources (CPU, memory, etc.). Jobs with different rewards arrive over time and require certain amounts of resources for the duration of their service. When a job arrives, the system must decide whether to admit it or reject it, and if admitted, in which server to schedule it. The objective is to maximize the expected total reward received by the system. This problem is motivated by control of cloud computing clusters, in which jobs are requests for virtual machines (VMs) or containers that reserve resources for various services, and rewards represent service priority of requests or price paid per time unit of service. We study this problem in an asymptotic regime where the number of servers and jobs’ arrival rates scale by a factor L, as L becomes large. We propose a resource reservation policy that asymptotically achieves at least 1/2, and under certain monotone property on jobs’ rewards and resources, at least [Formula: see text] of the optimal expected reward. The policy automatically scales the number of VM slots for each job type as the demand changes and decides in which servers the slots should be created in advance, without the knowledge of traffic rates.more » « less
-
We study the problem of scheduling VMs (Virtual Machines) in a distributed server platform, motivated by cloud computing applications. The VMs arrive dynamically over time to the system, and require a certain amount of resources (e.g. memory, CPU, etc) for the duration of their service. To avoid costly preemptions, we consider non-preemptive scheduling: Each VM has to be assigned to a server which has enough residual capacity to accommodate it, and once a VM is assigned to a server, its service cannot be disrupted (preempted). Prior approaches to this problem either have high complexity, require synchronization among the servers, or yield queue sizes/delays which are excessively large. We propose a non-preemptive scheduling algorithm that resolves these issues. In general, given an approximation algorithm to Knapsack with approximation ratio r , our scheduling algorithm can provide rβ fraction of the throughput region for β < r. In the special case of a greedy approximation algorithm to Knapsack, we further show that this condition can be relaxed to β<1. The parameters β and r can be tuned to provide a tradeoff between achievable throughput, delay, and computational complexity of the scheduling algorithm. Finally extensive simulation results using both synthetic and real traffic traces are presented to verify the performance of our algorithm.more » « less
-
The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environ- ments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mecha- nism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal re- source management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container tech- nologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that sci- entific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse- grained resource allocation.more » « less
An official website of the United States government
