skip to main content


Title: Hedge Your Bets: Optimizing Long-term Cloud Costs by Mixing VM Purchasing Options
Cloud platforms offer the same VMs under many purchasing options that specify different costs and time commitments, such as on-demand, reserved, sustained-use, scheduled reserve, transient, and spot block. In general, the stronger the commitment, i.e., longer and less flexible, the lower the price. However, longer and less flexible time commitments can increase cloud costs for users if future workloads cannot utilize the VMs they committed to buying. Large cloud customers often find it challenging to choose the right mix of purchasing options to reduce their long-term costs, while retaining the ability to adjust capacity up and down in response to workload variations.To address the problem, we design policies to optimize long-term cloud costs by selecting a mix of VM purchasing options based on short- and long-term expectations of workload utilization. We consider a batch trace spanning 4 years from a large shared cluster for a major state University system that includes 14k cores and 60 million job submissions, and evaluate how these jobs could be judiciously executed using cloud servers using our approach. Our results show that our policies incur a cost within 41% of an optimistic optimal offline approach, and 50% less than solely using on-demand VMs.  more » « less
Award ID(s):
1908536 1802523
NSF-PAR ID:
10192800
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2020 IEEE International Conference on Cloud Engineering (IC2E)
Page Range / eLocation ID:
105 to 115
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Cloud users can significantly reduce their cost (by up to 60%) by reserving virtual machines (VMs) for long periods (1 or 3 years) rather than acquiring them on demand. Unfortunately, reserving VMs exposes users to demand risk that can increase cost if their expected future demand does not materialize. Since accurately forecasting demand over long periods is challenging, users often limit their use of reserved VMs. To mitigate demand risk, Amazon operates a Reserved Instance Marketplace (RIM) where users may publicly list the remaining time on their VM reservations for sale at a price they set. The RIM enables users to limit demand risk by either selling VM reservations if their demand changes, or purchasing variable- and shorter-term VM reservations that better match their demand forecast horizon. Clearly, the RIM’s potential to mitigate demand risk is a function of its price characteristics. However, to the best of our knowledge, historical RIM prices have neither been made publicly available nor analyzed. To address the problem, we have been monitoring and archiving RIM prices for 1.75 years across all 69 availability zones and 22 regions in Amazon’s Elastic Compute Cloud (EC2). This paper provides a first look at this data and its implications for cost-effectively provisioning cloud infrastructure. 
    more » « less
  2. Cloud users can significantly reduce their cost (by up to 60%) by reserving virtual machines (VMs) for long periods (1 or 3 years) rather than acquiring them on demand. Unfortunately, reserving VMs exposes users to demand risk that can increase cost if their expected future demand does not materialize. Since accurately forecasting demand over long periods is challenging, users often limit their use of reserved VMs. To mitigate demand risk, Amazon operates a Reserved Instance Marketplace (RIM) where users may publicly list the remaining time on their VM reservations for sale at a price they set. The RIM enables users to limit demand risk by either selling VM reservations if their demand changes, or purchasing variable- and shorter-term VM reservations that better match their demand forecast horizon. Clearly, the RIM's potential to mitigate demand risk is a function of its price characteristics. However, to the best of our knowledge, historical RIM prices have neither been made publicly available nor analyzed. To address the problem, we have been monitoring and archiving RIM prices for 1.75 years across all 69 availability zones and 22 regions in Amazon's Elastic Compute Cloud (EC2). This paper provides a first look at this data and its implications for cost-effectively provisioning cloud infrastructure. 
    more » « less
  3. Predictive VM (Virtual Machine) auto-scaling is a promising technique to optimize cloud applications’ operating costs and performance. Understanding the job arrival rate is crucial for accurately predicting future changes in cloud workloads and proactively provisioning and de-provisioning VMs for hosting the applications. However, developing a model that accurately predicts cloud workload changes is extremely challenging due to the dynamic nature of cloud workloads. Long- Short-Term-Memory (LSTM) models have been developed for cloud workload prediction. Unfortunately, the state-of-the-art LSTM model leverages recurrences to predict, which naturally adds complexity and increases the inference overhead as input sequences grow longer. To develop a cloud workload prediction model with high accuracy and low inference overhead, this work presents a novel time-series forecasting model called WGAN-gp Transformer, inspired by the Transformer network and improved Wasserstein-GANs. The proposed method adopts a Transformer network as a generator and a multi-layer perceptron as a critic. The extensive evaluations with real-world workload traces show WGAN- gp Transformer achieves 5× faster inference time with up to 5.1% higher prediction accuracy against the state-of-the-art. We also apply WGAN-gp Transformer to auto-scaling mechanisms on Google cloud platforms, and the WGAN-gp Transformer-based auto-scaling mechanism outperforms the LSTM-based mechanism by significantly reducing VM over-provisioning and under-provisioning rates. 
    more » « less
  4. Amazon introduced spot instances in December 2009, enabling “customers to bid on unused Amazon EC2 capacity and run those instances for as long as their bid exceeds the current Spot Price.” Amazon’s real-time computational spot market was novel in multiple respects. For example, it was the first (and to date only) large-scale public implementation of market-based resource allocation based on dynamic pricing after decades of research, and it provided users with useful information, control knobs, and options for optimizing the cost of running cloud applications. Spot instances also introduced the concept of transient cloud servers derived from variable idle capacity that cloud platforms could revoke at any time. Transient servers have since become central to efficient resource management of modern clusters and clouds. As a result, Amazon’s spot market was the motivation for substantial research over the past decade. Yet, in November 2017, Amazon effectively ended its real-time spot market by announcing that users no longer needed to place bids and that spot prices will “...adjust more gradually, based on longer-term trends in supply and demand.” The changes made spot instances more similar to the fixed-price transient servers offered by other cloud platforms. Unfortunately, while these changes made spot instances less complex, they eliminated many benefits to sophisticated users in optimizing their applications. This paper provides a retrospective on Amazon’s real-time spot market, including its advantages and disadvantages for allocating transient servers compared to current fixed-price approaches. We also discuss some fundamental problems with Amazon’s spot market, which we identified in prior work (from 2016), that predicted its eventual end. We then discuss potential options for allocating transient servers that combine the advantages of Amazon’s real-time spot market, while also addressing the problems that likely led to its elimination. 
    more » « less
  5. Several recent studies have investigated the virtual machine (VM) provisioning problem for requests with time constraints (deadlines) in cloud systems. These studies typically assumed that a request is associated with a single execution time when running on VMs with a given resource demand. In this paper, we consider modern applications that are normally implemented with generic frameworks that allow them to execute with various numbers of threads on VMs with different resource demands. For such applications, it is possible for the users to specify multiple execution options (MEOs) for a request where each execution option is represented by a certain number of VMs with some resources to run the application and its corresponding execution time. We investigate the problem of virtual machine provisioning for such time-sensitive requests with MEOs in resource-constrained clouds. By incorporating the MEOs of requests, we propose several novel and flexible VM provisioning schemes that carefully balance resource usage efficiency, input workloads and request deadlines with the objective of achieving higher resource utilization and system benefits. We evaluated the proposed MEO-aware schemes on various workloads with both benchmark requests and synthetic requests. The results show that our MEO-aware algorithms outperform the state-of-the-art schemes that consider only a single execution option of requests by serving up to 38% more requests and achieving up to 27% more benefits. 
    more » « less