skip to main content


Title: Flexible VM Provisioning for Time-Sensitive Applications with Multiple Execution Options
Several recent studies have investigated the virtual machine (VM) provisioning problem for requests with time constraints (deadlines) in cloud systems. These studies typically assumed that a request is associated with a single execution time when running on VMs with a given resource demand. In this paper, we consider modern applications that are normally implemented with generic frameworks that allow them to execute with various numbers of threads on VMs with different resource demands. For such applications, it is possible for the users to specify multiple execution options (MEOs) for a request where each execution option is represented by a certain number of VMs with some resources to run the application and its corresponding execution time. We investigate the problem of virtual machine provisioning for such time-sensitive requests with MEOs in resource-constrained clouds. By incorporating the MEOs of requests, we propose several novel and flexible VM provisioning schemes that carefully balance resource usage efficiency, input workloads and request deadlines with the objective of achieving higher resource utilization and system benefits. We evaluated the proposed MEO-aware schemes on various workloads with both benchmark requests and synthetic requests. The results show that our MEO-aware algorithms outperform the state-of-the-art schemes that consider only a single execution option of requests by serving up to 38% more requests and achieving up to 27% more benefits.  more » « less
Award ID(s):
1618310
NSF-PAR ID:
10064359
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE International Conference on Cloud Computing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Several recent studies have investigated the virtual machine (VM) provisioning problem for requests with time constraints (deadlines) in cloud systems. These studies typically assumed that a request is associated with a single execution time when running on VMs with a given resource demand. In this paper, we consider modern applications that are normally implemented with generic frameworks that allow them to execute with various numbers of threads on VMs with different resource demands. For such applications, it is possible for the users to specify multiple execution options (MEOs) for a request where each execution option is represented by a certain number of VMs with some resources to run the application and its corresponding execution time. We investigate the problem of virtual machine provisioning for such time-sensitive requests with MEOs in resource-constrained clouds. By incorporating the MEOs of requests, we propose several novel and flexible VM provisioning schemes that carefully balance resource usage efficiency, input workloads and request deadlines with the objective of achieving higher resource utilization and system benefits. We evaluated the proposed MEO-aware schemes on various workloads with both benchmark requests and synthetic requests. The results show that our MEO-aware algorithms outperform the state-of-the-art schemes that consider only a single execution option of requests by serving up to 38% more requests and achieving up to 27% more benefits. 
    more » « less
  2. Traditionally, HPC workloads have been deployed in bare-metal clusters; but the advances in virtualization have led the pathway for these workloads to be deployed in virtualized clusters. However, HPC cluster administrators/providers still face challenges in terms of resource elasticity and virtual machine (VM) provisioning at large-scale, due to the lack of coordination between a traditional HPC scheduler and the VM hypervisor (resource management layer). This lack of interaction leads to low cluster utilization and job completion throughput. Furthermore, the VM provisioning delays directly impact the overall performance of jobs in the cluster. Hence, there is a need for effectively provisioning virtualized HPC clusters, which can best-utilize the physical hardware with minimal provisioning overheads.Towards this, we propose Multiverse, a VM provisioning framework, which can dynamically spawn VMs for incoming jobs in a virtualized HPC cluster, by integrating the HPC scheduler along with VM resource manager. We have implemented this framework on the Slurm scheduler along with the vSphere VM resource manager. In order to reduce the VM provisioning overheads, we use instant cloning which shares both the disk and memory with the parent VM, when compared to full VM cloning which has to boot-up a new VM from scratch. Measurements with real-world HPC workloads demonstrate that, instant cloning is 2.5× faster than full cloning in terms of VM provisioning time. Further, it improves resource utilization by up to 40%, and cluster throughput by up to 1.5×, when compared to full clone for bursty job arrival scenarios. 
    more » « less
  3. Predictive VM (Virtual Machine) auto-scaling is a promising technique to optimize cloud applications’ operating costs and performance. Understanding the job arrival rate is crucial for accurately predicting future changes in cloud workloads and proactively provisioning and de-provisioning VMs for hosting the applications. However, developing a model that accurately predicts cloud workload changes is extremely challenging due to the dynamic nature of cloud workloads. Long- Short-Term-Memory (LSTM) models have been developed for cloud workload prediction. Unfortunately, the state-of-the-art LSTM model leverages recurrences to predict, which naturally adds complexity and increases the inference overhead as input sequences grow longer. To develop a cloud workload prediction model with high accuracy and low inference overhead, this work presents a novel time-series forecasting model called WGAN-gp Transformer, inspired by the Transformer network and improved Wasserstein-GANs. The proposed method adopts a Transformer network as a generator and a multi-layer perceptron as a critic. The extensive evaluations with real-world workload traces show WGAN- gp Transformer achieves 5× faster inference time with up to 5.1% higher prediction accuracy against the state-of-the-art. We also apply WGAN-gp Transformer to auto-scaling mechanisms on Google cloud platforms, and the WGAN-gp Transformer-based auto-scaling mechanism outperforms the LSTM-based mechanism by significantly reducing VM over-provisioning and under-provisioning rates. 
    more » « less
  4. The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedulers need to handle functions that are very short-lived, have unpredictable arrival patterns, and require expensive setup of sandboxes. The challenge of running a large number of such functions in a multi-tenant cluster makes existing scheduling frameworks unsuitable. We present Archipelago, a platform that enables low latency request execution in a multi-tenant serverless setting. Archipelago views each application as a DAG of functions, and every DAG in associated with a latency deadline. Archipelago achieves its per-DAG request latency goals by: (1) partitioning a given cluster into a number of smaller worker pools, and associating each pool with a semi-global scheduler (SGS), (2) using a latency-aware scheduler within each SGS along with proactive sandbox allocation to reduce overheads, and (3) using a load balancing layer to route requests for different DAGs to the appropriate SGS, and automatically scale the number of SGSs per DAG. Our testbed results show that Archipelago meets the latency deadline for more than 99% of realistic application request workloads, and reduces tail latencies by up to 36X compared to state-of-the-art serverless platforms. 
    more » « less
  5. We consider a distributed server system consisting of a large number of servers, each with limited capacity on multiple resources (CPU, memory, etc.). Jobs with different rewards arrive over time and require certain amounts of resources for the duration of their service. When a job arrives, the system must decide whether to admit it or reject it, and if admitted, in which server to schedule it. The objective is to maximize the expected total reward received by the system. This problem is motivated by control of cloud computing clusters, in which jobs are requests for virtual machines (VMs) or containers that reserve resources for various services, and rewards represent service priority of requests or price paid per time unit of service. We study this problem in an asymptotic regime where the number of servers and jobs’ arrival rates scale by a factor L, as L becomes large. We propose a resource reservation policy that asymptotically achieves at least 1/2, and under certain monotone property on jobs’ rewards and resources, at least [Formula: see text] of the optimal expected reward. The policy automatically scales the number of VM slots for each job type as the demand changes and decides in which servers the slots should be created in advance, without the knowledge of traffic rates. 
    more » « less