skip to main content


Title: FT-VMP: Fault-Tolerant Virtual Machine Placement in Cloud Data Centers
Virtual machine (VM) replication is an effective technique in cloud data centers to achieve fault-tolerance, load-balance, and quick-responsiveness to user requests. In this paper we study a new fault-tolerant VM placement problem referred to as FT-VMP. Given that different VM has different fault-tolerance requirement (i.e., different VM requires different number of replica copies) and compatibility requirement (i.e., some VMs and their replicas cannot be placed into some physical machines (PMs) due to software or platform incompatibility), FT-VMP studies how to place VM replica copies inside cloud data centers in order to minimize the number of PMs storing VM replicas, under the constraints that i) for fault-tolerant purpose, replica copies of the same VM cannot be placed inside the same PM and ii) each PM has a limited amount of storage capacity. We first prove that FT-VMP is NP-hard. We then design an integer linear programming (ILP)-based algorithm to solve it optimally. As ILP takes time to compute thus is not suitable for large scale cloud data centers, we design a suite of efficient and scalable heuristic fault-tolerant VM placement algorithms. We show that a) ILP-based algorithm outperforms the state-of-the-art VM replica placement in a wide range of network dynamics and b) that all our fault-tolerant VM placement algorithms are able to turn off significant number of PMs to save energy in cloud data centers. In particular, we show that our algorithms can consolidate (i.e., turn off) around 100 PMs in a small data center of 256 PMs and 700 PMs in a large data center of 1028PMs.  more » « less
Award ID(s):
1911191
NSF-PAR ID:
10174515
Author(s) / Creator(s):
;
Date Published:
Journal Name:
International Conference on Computer Communications and Networks (ICCCN 2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Fault-tolerant virtual machine (VM) placement refers to the process of placing multiple copies of the same VM cloud application inside cloud data centers. The challenge is how to place required number of VM replicas while minimizing the number of physical machines (PMs) that store them, in order to save energy consumption of cloud data centers. We refer to it as fault-tolerant VM placement problem. In our previous work, we have proposed a greedy algorithm to solve this problem. In this paper, we compare it with an existing research that is based on well-known Welsh Powell Graph-Coloring Algorithm to place items into bins while considering the conflicts between items and items and items and bins. Via extensive simulations, we show that our greedy algorithm can turn off 40-50% more PMs than existing work and can place upto four times as many VM replicas as existing work, achieving much stronger fault-tolerance with less energy consumption. We also compare both algorithms with the optimal integer lin- ear programming (ILP)-based algorithm, which serves as the benchmark of the comparison. 
    more » « less
  2. null (Ed.)
    Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas—the pilot and copilot—to proactively add redundancy to all stages of processing a client’s command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations—ping-pong batching and null dependency elimination—that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies. 
    more » « less
  3. null (Ed.)
    Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas—the pilot and copilot—to proactively add redundancy to all stages of processing a client’s command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations—ping-pong batching and null dependency elimination—that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies. 
    more » « less
  4. null (Ed.)
    Abstract Lipid structures affect membrane biophysical properties such as thickness, stability, permeability, curvature, fluidity, asymmetry, and interdigitation, contributing to membrane function. Sphingolipids are abundant in plant endomembranes and plasma membranes (PMs) and comprise four classes: ceramides, hydroxyceramides, glucosylceramides, and glycosylinositolphosphoceramides (GIPCs). They constitute an array of chemical structures whose distribution in plant membranes is unknown. With the aim of describing the hydrophobic portion of sphingolipids, 18 preparations from microsomal (MIC), vacuolar (VM), PM, and detergent-resistant membranes (DRM) were isolated from Arabidopsis (Arabidopsis thaliana) leaves. Sphingolipid species, encompassing pairing of long-chain bases and fatty acids, were identified and quantified in these membranes. Sphingolipid concentrations were compared using univariate and multivariate analysis to assess sphingolipid diversity, abundance, and predominance across membranes. The four sphingolipid classes were present at different levels in each membrane: VM was enriched in glucosylceramides, hydroxyceramides, and GIPCs; PM in GIPCs, in agreement with their key role in signal recognition and sensing; and DRM in GIPCs, as reported by their function in nanodomain formation. While a total of 84 sphingolipid species was identified in MIC, VM, PM, and DRM, only 34 were selectively distributed in the four membrane types. Conversely, every membrane contained a different number of predominant species (11 in VM, 6 in PM, and 17 in DRM). This study reveals that MIC, VM, PM, and DRM contain the same set of sphingolipid species but every membrane source contains its own specific assortment based on the proportion of sphingolipid classes and on the predominance of individual species. 
    more » « less
  5. We introduce Canal, a programmable, topic-based, publish/subscribe system that is designed for multi-tier cloud deployments (e.g. edge-cloud, multi-cloud, IoT-cloud, etc.). Canal implements a triggered computational (i.e. “serverless”) programming model and provides developers with a uniform and portable programming interface. To achieve scalability and reliability, Canal combines the use of a distributed hash table (DHT) and replica consensus protocol to distribute and replicate functions, state, and data. Canal also decouples replica placement from the DHT topology to allow developers to optimize function placement for different objectives. We evaluate Canal using a real-world multi-tier IoT deployment and we use Canal to compare placement strategies, end-to-end performance, and failure recovery using both benchmarks and a real-world IoT-edge application. Our results show that Canal is able to achieve both low latency and reliability in this setting. 
    more » « less