skip to main content

Title: CEDULE+: Resource Management for Burstable Cloud Instances Using Predictive Analytics
Nearly all principal cloud providers now provide burstable instances in their offerings. The main attraction of this type of instance is that it can boost its performance for a limited time to cope with workload variations. Although burstable instances are widely adopted, it is not clear how to efficiently manage them to avoid waste of resources. In this paper, we use predictive data analytics to optimize the management of burstable instances. We design CEDULE+, a data-driven framework that enables efficient resource management for burstable cloud instances by analyzing the system workload and latency data. CEDULE+ selects the most profitable instance type to process incoming requests and controls CPU, I/O, and network usage to minimize the resource waste without violating Service Level Objectives (SLOs). CEDULE+ uses lightweight profiling and quantile regression to build a data-driven prediction model that estimates system performance for all combinations of instance type, resource type, and system workload. CEDULE+ is evaluated on Amazon EC2, and its efficiency and high accuracy are assessed through real-case scenarios. CEDULE+ predicts application latency with errors less than 10%, extends the maximum performance period of a burstable instance up to 2.4 times, and decreases deployment costs by more than 50%.
Authors:
; ; ;
Award ID(s):
1838022 1838024 1756013
Publication Date:
NSF-PAR ID:
10206150
Journal Name:
IEEE Transactions on Network and Service Management
Page Range or eLocation-ID:
1 to 1
ISSN:
2373-7379
Sponsoring Org:
National Science Foundation
More Like this
  1. During the past few years, all leading cloud providers introduced burstable instances that can sprint their performance for a limited period to address sudden workload variations. Despite the availability of burstable instances, there is no clear understanding of how to minimize the waste of resources by regulating their burst capacity to the workload requirements. This is especially true when it comes to non-CPU-intensive applications. In this paper, we investigate how to limit network and I/O usage to optimize the efficiency of the bursting process. We also study which resource shall be controlled to benefit both cloud providers and end-users. We design MRburst (Multi-Resource burstable performance scheduler) to automatically limit multiple resources (i.e., network, I/O, and CPU) and make the application comply with a user-defined service level objective (SLO) while minimizing wasted resources. MRburst is evaluated on Amazon EC2 using two multi-resource applications: an FTP server and a Ceph system. Experimental results show that MRburst outperforms state-of-the-art approaches by allowing instances to speed up their performance for up to 2.4 times longer period while meeting SLO.
  2. Cloud deployments now increasingly exploit Field-Programmable Gate Array (FPGA) accelerators as part of virtual instances. While cloud FPGAs are still essentially single-tenant, the growing demand for efficient hardware acceleration paves the way to FPGA multi-tenancy. It then becomes necessary to explore architectures, design flows, and resource management features that aim at exposing multi-tenant FPGAs to the cloud users. In this article, we discuss a hardware/software architecture that supports provisioning space-shared FPGAs in Kernel-based Virtual Machine (KVM) clouds. The proposed hardware/software architecture introduces an FPGA organization that improves hardware consolidation and support hardware elasticity with minimal data movement overhead. It also relies on VirtIO to decrease communication latency between hardware and software domains. Prototyping the proposed architecture with a Virtex UltraScale+ FPGA demonstrated near specification maximum frequency for on-chip data movement and high throughput in virtual instance access to hardware accelerators. We demonstrate similar performance compared to single-tenant deployment while increasing FPGA utilization, which is one of the goals of virtualization. Overall, our FPGA design achieved about 2× higher maximum frequency than the state of the art and a bandwidth reaching up to 28 Gbps on 32-bit data width.
  3. In this paper, we present design, implementation and evaluation of a control framework, EXTRA (EXperience-driven conTRol frAmework), for scheduling in general-purpose Distributed Stream Data Processing Systems (DSDPSs). Our design is novel due to the following reasons. First, EXTRA enables a DSDPS to dynamically change the number of threads on the fly according to system states and demands. Most existing methods, however, use a fixed number of threads to carry workload (for each processing unit of an application), which is specified by a user in advance and does not change during runtime. So our design introduces a whole new dimension for control in DSDPSs, which has a great potential to significantly improve system flexibility and efficiency, but makes the scheduling problem much harder. Second, EXTRA leverages an experience/data driven model-free approach for dynamic control using the emerging Deep Reinforcement Learning (DRL), which enables a DSDPS to learn the best way to control itself from its own experience just as a human learns a skill (such as driving and swimming) without any accurate and mathematically solvable model. We implemented it based on a widely-used DSDPS, Apache Storm, and evaluated its performance with three representative Stream Data Processing (SDP) applications: continuous queries, wordmore »count (stream version) and log stream processing. Particularly, we performed experiments under realistic settings (where multiple application instances are mixed up together), rather than a simplified setting (where experiments are conducted only on a single application instance) used in most related works. Extensive experimental results show: 1) Compared to Storm’s default scheduler and the state-of-the-art model-based method, EXTRA substantially reduces average end-to-end tuple processing time by 39.6% and 21.6% respectively on average. 2) EXTRA does lead to more flexible and efficient stream data processing by enabling the use of a variable number of threads. 3) EXTRA is robust in a highly dynamic environment with significant workload change.« less
  4. With the rapid development of the Internet of Things (IoT), computational workloads are gradually moving toward the internet edge for low latency. Due to significant workload fluctuations, edge data centers built in distributed locations suffer from resource underutilization and requires capacity underprovisioning to avoid wasting capital investment. The workload fluctuations, however, also make edge data centers more suitable for battery-assisted power management to counter the performance impact due to underprovisioning. In particular, the workload fluctuations allow the battery to be frequently recharged and made available for temporary capacity boosts. But, using batteries can overload the data center cooling system which is designed with a matching capacity of the power system. In this paper, we design a novel power management solution, DeepPM, that exploits the UPS battery and cold air inside the edge data center as energy storage to boost the performance. DeepPM uses deep reinforcement learning (DRL) to learn the data center thermal behavior online in a model-free manner and uses it on-the-fly to determine power allocation for optimum latency performance without overheating the data center. Our evaluation shows that DeepPM can improve latency performance by more than 50% compared to a power capping baseline while the server inlet temperaturemore »remains within safe operating limits (e.g., 32°C).« less
  5. Serverless computing platforms simplify development, deployment, and automated management of modular software functions. However, existing serverless platforms typically assume an over-provisioned cloud, making them a poor fit for Edge Computing environments where resources are scarce. In this paper we propose a redesigned serverless platform that comprehensively tackles the key challenges for serverless functions in a resource constrained Edge Cloud. Our Mu platform cleanly integrates the core resource management components of a serverless platform: autoscaling, load balancing, and placement. Each worker node in Mu transparently propagates metrics such as service rate and queue length in response headers, feeding this information to the load balancing system so that it can better route requests, and to our autoscaler to anticipate workload fluctuations and proactively meet SLOs. Data from the Autoscaler is then used by the placement engine to account for heterogeneity and fairness across competing functions, ensuring overall resource efficiency, and minimizing resource fragmentation. We implement our design as a set of extensions to the Knative serverless platform and demonstrate its improvements in terms of resource efficiency, fairness, and response time. Evaluating Mu, shows that it improves fairness by more than 2x over the default Kubernetes placement engine, improves 99th percentile response timesmore »by 62% through better load balancing, reduces SLO violations and resource consumption by pro-active and precise autoscaling. Mu reduces the average number of pods required by more than ~15% for a set of real Azure workloads.« less