Empirical performance measurements of computer systems almost always exhibit variability and anomalies. Run-to-run and server-to-server variations are common for CPU, memory, disk, and network performance characteristics. In our previous work, we focused on taming performance variability for memory, disk, and network and established an interactive analysis service at: https://confirm.fyi/ to help users of the CloudLab testbed better plan and conduct their experiments. In this paper, we describe our analysis of CPU variability based on over 1.3M performance measurements from nearly 1,800 servers and present our initial findings. The focus of this work is on capturing hardware variability, which can make repeatable experiments more difficult and can impact conclusions; it it this important for systems researchers to understand. (We note that, though we do not study it in this work, in the cloud, multi-tenancy and resource sharing an exacerbate the problem.) Variability also inevitably impacts performance and operation of middleware and high-level applications, contributing to the straggler problems in many domains, including HPC, Big Data, and Machine Learning, and on many types of cyberinfrastructures. We analyze the data from the CloudLab servers allocated in an exclusive fashion, with no virtualization. While our analysis focuses on the testbed that aims to promote reproducible research, we believe our approach and the findings can be of value to people who manage, analyze, and utilize shared computing resources in supercomputers, clouds, and datacenters.
more »
« less
In Datacenter Performance, The Only Constant Is Change
All computing infrastructure suffers from performance variability, be it bare-metal or virtualized. This phenomenon originates from many sources: some transient, such as noisy neighbors, and others more permanent but sudden, such as changes or wear in hardware, changes in the underlying hypervisor stack, or even undocumented interactions between the policies of the computing resource provider and the active workloads. Thus, performance measurements obtained on clouds, HPC facilities, and, more generally, datacenter environments are almost guaranteed to exhibit performance regimes that evolve over time, which leads to undesirable nonstationarities in application performance. In this paper, we present our analysis of performance of the bare-metal hardware available on the CloudLab testbed where we focus on quantifying the evolving performance regimes using changepoint detection. We describe our findings, backed by a dataset with nearly 6.9M benchmark results collected from over 1600 machines over a period of 2 years and 9 months. These findings yield a comprehensive characterization of real-world performance variability patterns in one computing facility, a methodology for studying such patterns on other infrastructures, and contribute to a better understanding of performance variability in general.
more »
« less
- Award ID(s):
- 1743363
- PAR ID:
- 10197400
- Date Published:
- Journal Name:
- Proceedings of the Twentieth IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In recent years, networked airborne computing (NAC) has emerged as a promising paradigm because it can leverage the collaborative capabilities of unmanned aerial vehicles (UAVs) for distributed computing tasks. Despite the burgeoning interests in NAC and UAV-based computing, many existing studies depend on over-simplified simulations for performance evaluation. This reliance has led to a gap in our understanding of NAC’s true potential and challenges. To fill this gap, this paper presents a comprehensive approach: the creation of a realistic simulator and a novel hardware testbed. The simulator, developed using ROS and Gazebo, emulates networked UAVs, focusing on resource-sharing and distributed computing capabilities. This tool offers a cost-effective, scalable, and adaptable environment, making it ideal for preliminary investigations across a myriad of real-world scenarios. In parallel, our hardware testbed comprises multiple quadrotors, each equipped with a Pixhawk control unit, a Raspberry Pi computing module, a real-time kinematic (RTK) positioning system, and multiple communication units. Through extensive simulations and hardware tests, we delve into the key determinants of NAC performance, such as computation task size, number of UAVs, communication quality, and UAV mobility. Our findings not only underscore the inherent challenges in optimizing NAC performance but also provide pivotal insights for future enhancements. These insights encompass refining the simulator, reducing computation overheads, and equipping the hardware testbed with cutting-edge communication devices.more » « less
-
Scalability and flexibility of modern cloud application can be mainly attributed to virtual machines (VMs) and containers, where virtual machines are isolated operating systems that run on a hypervisor while containers are lightweight isolated processes that share the Host OS kernel. To achieve the scalability and flexibility required for modern cloud applications, each bare-metal server in the data center often houses multiple virtual machines, each of which runs multiple containers and multiple containerized applications that often share the same set of libraries and code, often referred to as images. However, while container frameworks are optimized for sharing images within a single VM, sharing images across multiple VMs, even if the VMs are within the same bare-metal server, is nearly non-existent due to the nature of VM isolation, leading to repetitive downloads, causing redundant added network traffic and latency. This work aims to resolve this problem by utilizing SmartNICs, which are specialized network hardware that provide hardware acceleration and offload capabilities for networking tasks, to optimize image retrieval and sharing between containers across multiple VMs on the same server. The method proposed in this work shows promise in cutting down container cold start time by up to 92%, reducing network traffic by 99.9%. Furthermore, the result is even more promising as the performance benefit is directly proportional to the number of VMs in a server that concurrently seek the same image, which guarantees increased efficiency as bare metal machine specifications improve.more » « less
-
The integration of onboard computing capabilities with unmanned aerial vehicles (UAV) has gained significant attention in recent years as part of mobile computing paradigms such as mobile edge computing (MEC), fog computing, and mobile cloud computing. To enhance the performance of airborne computing, networked airborne computing (NAC) aims to interconnect UAVs through direct flight-to-flight links, with UAVs sharing resources with each other. However, despite the growing interest in NAC and UAV-based computing, existing studies rely heavily on numerical simulations for performance evaluation and lack realistic simulators and hardware testbeds. To fill this gap, this paper presents the development of two NAC platforms: a realistic simulator based on ROS and Gazebo, and a hardware testbed with multiple UAVs communicating and sharing computing resources. Through simulation and real flight tests with two computation applications, we evaluate the platforms and examine the impact of mobility on NAC performance. Our findings offer valuable insights into NAC and provide guidance for future advancements.more » « less
-
Compared to traditional hardware development methodologies, High-Level Synthesis (HLS) offers a faster time-to-market and lower design cost at the expense of implementation efficiency. Although Software/Hardware Codesign has been used in many areas, its usability for benchmarking of candidates in cryptographic competitions has been largely unexplored. This paper provides a comparison of the HLS- and RTL-based design methodologies when applied to the hardware design of the Number Theoretic Transform (NTT) – a core arithmetic function of lattice-based Post-Quantum Cryptography (PQC). As a next step, we apply Software/Hardware Codesign approach to the implementation of three PQC schemes based on NTT. Then, we integrate our HLS implementation into the Xilinx SDSoC environment. We demonstrate that an overhead of SDSoC compared to traditional Bare Metal approach is acceptable. This paper also shows that an HLS implementation obtained by modeling a block diagram is typically much better than an implementation obtained by using design space exploration. We conclude that the HLS/SDSoC and RTL/Bare Metal approaches generate comparable results.more » « less
An official website of the United States government

