skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Can we containerize internet measurements?
Container systems (e.g., Docker) provide a well-defined, lightweight, and versatile foundation to streamline the process of tool deployment, to provide a consistent and repeatable experimental interface, and to leverage data centers in the global cloud infrastructure as measurement vantage points. However, the virtual network devices commonly used to connect containers to the Internet are known to impose latency overheads which distort the values reported by measurement tools running inside containers. In this study, we develop a tool called MACE to measure and remove the latency overhead of virtual network devices as used by Docker containers. A key insight of MACE is the fact that container functions all execute in the same kernel. Based on this insight, MACE is implemented as a Linux kernel module using the trace event subsystem to measure latency along the network stack code path. Using CloudLab, we evaluate MACE by comparing the ping measurements emitted from a slim-ping container to the ones emitted using the same tool running in the bare metal machine under varying traffic loads. Our evaluation shows that the MACE-adjusted RTT measurements are within 20 µs of the bare metal ping RTTs on average while incurring less than 25 µs RTT perturbation. We also compare RTT perturbation incurred by MACE with perturbation incurred by the built-in ftrace kernel tracing system and find that MACE incurs less perturbation.  more » « less
Award ID(s):
1850297
PAR ID:
10166353
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of ACM/IRTF/ISOC Applied Networking Research Workshop (ANRW'19) co-located with IETF 105, Montreal, Canada, July 2019.
Page Range / eLocation ID:
52 to 58
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Scalability and flexibility of modern cloud application can be mainly attributed to virtual machines (VMs) and containers, where virtual machines are isolated operating systems that run on a hypervisor while containers are lightweight isolated processes that share the Host OS kernel. To achieve the scalability and flexibility required for modern cloud applications, each bare-metal server in the data center often houses multiple virtual machines, each of which runs multiple containers and multiple containerized applications that often share the same set of libraries and code, often referred to as images. However, while container frameworks are optimized for sharing images within a single VM, sharing images across multiple VMs, even if the VMs are within the same bare-metal server, is nearly non-existent due to the nature of VM isolation, leading to repetitive downloads, causing redundant added network traffic and latency. This work aims to resolve this problem by utilizing SmartNICs, which are specialized network hardware that provide hardware acceleration and offload capabilities for networking tasks, to optimize image retrieval and sharing between containers across multiple VMs on the same server. The method proposed in this work shows promise in cutting down container cold start time by up to 92%, reducing network traffic by 99.9%. Furthermore, the result is even more promising as the performance benefit is directly proportional to the number of VMs in a server that concurrently seek the same image, which guarantees increased efficiency as bare metal machine specifications improve. 
    more » « less
  2. The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environ- ments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mecha- nism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal re- source management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container tech- nologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that sci- entific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse- grained resource allocation. 
    more » « less
  3. null (Ed.)
    Container networking, which provides connectivity among containers on multiple hosts, is crucial to building and scaling container-based microservices. While overlay networks are widely adopted in production systems, they cause significant performance degradation in both throughput and latency compared to physical networks. This paper seeks to understand the bottlenecks of in-kernel networking when running container overlay networks. Through profiling and code analysis, we find that a prolonged data path, due to packet transformation in overlay networks, is the culprit of performance loss. Furthermore, existing scaling techniques in the Linux network stack are ineffective for parallelizing the prolonged data path of a single network flow. We propose FALCON, a fast and balanced container networking approach to scale the packet processing pipeline in overlay networks. FALCON pipelines software interrupts associated with different network devices of a single flow on multiple cores, thereby preventing execution serialization of excessive software interrupts from overloading a single core. FALCON further supports multiple network flows by effectively multiplexing and balancing software interrupts of different flows among available cores. We have developed a prototype of FALCON in Linux. Our evaluation with both micro-benchmarks and real-world applications demonstrates the effectiveness of FALCON, with significantly improved performance (by 300% for web serving) and reduced tail latency (by 53% for data caching). 
    more » « less
  4. Introduction Java Multi-Version Execution (JMVX) is a tool for performing Multi-Version Execution (MVX) and Record Replay (RR) in Java. Most tools for MVX and RR observe the behavior of a program at a low level, e.g., by looking at system calls. Unfortunately, this approach fails for high level language virtual machines due to benign divergences (differences in behavior that accomplish that same result) introduced by the virtual machine -- particularly by garbage collection and just-in-time compilation. In other words, the management of the virtual machines creates differing sequences of system calls that lead existing tools to believe a program has diverged, when in practice, the application running on top of the VM has not. JMVX takes a different approach, opting instead to add MVX and RR logic into the bytecode of compiled programs running in the VM to avoid benign divergences related to VM management.   This artifact is a docker image that will create a container holding our source code, compiled system, and experiments with JMVX. The image allows you to run the experiments we used to address the research questions from the paper (from Section 4). This artifact is desiged to show: [Supported] JMVX performs MVX for Java [Supported] JMVX performs RR for Java [Supported] JMVX is performant    In the "Step by Step" section, we will point out how to run experiments to generate data supporting these claims. The 3rd claim is supported, however, it may not be easily reproducible.  For the paper we measured performance on bare metal rather than in a docker container. When testing the containerized artifact on a Macbook (Sonoma v14.5), JMVX ran slower than expected. Similarly, see the section on "Differences From Experiment" to see properties of the artifact that were altered (and could affect runtime results). Thanks for taking the time to explore our artifact.   Hardware Requirements x86 machine running Linux, preferably Ubuntu 22.04 (Jammy) 120 Gb of storage About 10 Gb of RAM to spare 2+ cores Getting Started Guide Section is broken into 2 parts, setting up the docker container and running a quick experiment to test if everything is working.   Container Setup Download the container image (DOI 10.5281/zenodo.12637140). If using docker desktop, increase the size of the virtual disk to 120 gb. In the GUI goto Settings > Resources > Virtual Disk (should be a slider).  From the terminal, modify `diskSizeMiB` field in docker's `settings.json` and restart docker. Linux location: ~/.docker/desktop/settings.json. Mac location  : ~/Library/Group Containers/group.com.docker/settings.json. Install with docker load -i java-mvx-image.tar.gz This process takes can take 30 minutes to 1 hour. Start the container via: docker run --name jmvx -it --shm-size="10g" java-mvx The `--shm-size` parameter is important as JMVX will crash the JVM if not enough space is available (detected via a SIGBUS error).    Quick Start The container starts you off in an environment with JMVX already prepared, e.g., JMVX has been built and the instrumentation is done. The script test-quick.sh will test all of JMVX's features for DaCapo's avrora benchmark. The script has comments explaining each command. It should take about 10 minutes to run.   The script starts by running our system call tracer tool. This phase of the script will create the directory /java-mvx/artifact/trace, which will contain:   natives-avrora.log -- (serialized) map of methods, that resulted in system calls, to the stack trace that generated the call. /java-mvx/artifact/scripts/tracer/analyze2.sh is used to analyze this log and generate other files in this directory. table.txt - a table showing how many unique stack traces led to the invocation of a native method that called a system call. recommended.txt - A list of methods JMVX recommends to instrument for the benchmark. dump.txt - A textual dump of the last 8 methods from every stack trace logged. This allows us to reduce the number of methods we need to instrument by choosing a wrapper that can handle multiple system calls. `FileSystemProvider.checkAccess` is an example of this.   JMVX will recommend functions to instrument, these are included in recommended.txt.  If you inspect the file, you'll see some simple candidates for instrumentation, e.g., available, open, and read, from FileInputStream. The instrumentation code for FileInputInputStream can be found in /java-mvx/src/main/java/edu/uic/cs/jmvx/bytecode/FileInputStreamClassVisitor.java. The recommendations work in many cases, but for some, e.g. FileDescriptor.closeAll, we chose a different method (e.g., FileInputStream.close) by manually inspecting dump.txt.   After tracing, runtime data is gathered, starting with measuring the overhead caused by instrumentation. Next it will move onto getting data on MVX, and finally RR. The raw output of the benchmark runs for these phases is saved in /java-mvx/artifact/data/quick. Tables showing the benchmark's runtime performance will be placed in /java-mvx/artifact/tables/quick. That directory will contain:   instr.txt -- Measures the overhead of instrumentation. mvx.txt -- Performance for multi-version execution mode. rec.txt  -- Performance for recording. rep.txt   -- Performance for replaying.   This script captures data for research claims 1-3 albeit for a single benchmark and with a single iteration. Note, data is captured for the benchmark's memory usage, but the txt tables only display runtime data. For more, see readme.pdf or readme.md. 
    more » « less
  5. Recent work has shown that lightweight virtualization like Docker containers can be used in HPC to package applications with their runtime environments. In many respects, applications in containers perform similarly to native applications. Other work has shown that containers can have adverse effects on the latency variation of communications with the enclosed application. This latency variation may have an impact on the performance of some HPC workloads, especially those dependent on synchronization between processes. In this work, we measure the latency characteristics of messages between Docker containers, and then compare those measurements to the performance of real-world applications. Our specific goals are to: measure the changes in mean and variation of latency with Docker containers, study how this affects the synchronization time of MPI processes, and measure the impact of these factors on real­world applications such as the NAS Parallel Benchmark (NPB). 
    more » « less