The proliferation of innovative mobile services such
as augmented reality, networked gaming, and autonomous driving
has spurred a growing need for low-latency access to computing
resources that cannot be met solely by existing centralized
cloud systems. Mobile Edge Computing (MEC) is expected to
be an effective solution to meet the demand for low-latency
services by enabling the execution of computing tasks at the
network-periphery, in proximity to end-users. While a number
of recent studies have addressed the problem of determining
the execution of service tasks and the routing of user requests
to corresponding edge servers, the focus has primarily been
on the efficient utilization of computing resources, neglecting
the fact that non-trivial amounts of data need to be stored to
enable service execution, and that many emerging services exhibit
asymmetric bandwidth requirements. To fill this gap, we study
the joint optimization of service placement and request routing
in MEC-enabled multi-cell networks with multidimensional
(storage-computation-communication) constraints. We show that
this problem generalizes several problems in literature and
propose an algorithm that achieves close-to-optimal performance
using randomized rounding. Evaluation results demonstrate that
our approach can effectively utilize the available resources to
maximize the number of requests served by low-latency edge
cloud servers.
more »
« less
Dynamic Resource Allocation in the Cloud with Near-Optimal Efficiency
Cloud computing has motivated renewed interest in resource allocation problems with new consumption models. A common goal is to share a resource, such as CPU or I/O bandwidth, among distinct users with different demand patterns as well as different quality of service requirements. To ensure these service requirements, cloud offerings often come with a service level agreement (SLA) between the provider and the users. A SLA specifies the amount of a resource a user is entitled to utilize. In many cloud settings, providers would like to operate resources at high utilization while simultaneously respecting individual SLAs. There is typically a trade-off between these two objectives; for example, utilization can be increased by shifting away resources from idle users to “scavenger” workload, but with the risk of the former then becoming active again. We study this fundamental tradeoff by formulating a resource allocation model that captures basic properties of cloud computing systems, including SLAs, highly limited feedback about the state of the system, and variable and unpredictable input sequences. Our main result is a simple and practical algorithm that achieves near-optimal performance on the above two objectives. First, we guarantee nearly optimal utilization of the resource even if compared with the omniscient offline dynamic optimum. Second, we simultaneously satisfy all individual SLAs up to a small error. The main algorithmic tool is a multiplicative weight update algorithm and a primal-dual argument to obtain its guarantees. We also provide numerical validation on real data to demonstrate the performance of our algorithm in practical applications.
more »
« less
- Award ID(s):
- 1717947
- PAR ID:
- 10303945
- Date Published:
- Journal Name:
- Operations Research
- ISSN:
- 0030-364X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Network Function Virtualization (NFV) platforms consume significant energy, introducing high operational costs in edge and data centers. This paper presents a novel framework called GreenNFV that optimizes resource usage for network function chains using deep reinforcement learning. GreenNFV optimizes resource parameters such as CPU sharing ratio, CPU frequency scaling, last-level cache (LLC) allocation, DMA buffer size, and packet batch size. GreenNFV learns the resource scheduling model from the benchmark experiments and takes Service Level Agreements (SLAs) into account to optimize resource usage models based on the different throughput and energy consumption requirements. Our evaluation shows that GreenNFV models achieve high transfer throughput and low energy consumption while satisfying various SLA constraints. Specifically, GreenNFV with Throughput SLA can achieve 4.4× higher throughput and 1.5× better energy efficiency over the baseline settings, whereas GreenNFV with Energy SLA can achieve 3× higher throughput while reducing energy consumption by 50%.more » « less
-
Network slicing is a promising technology that allows mobile network operators to efficiently serve various emerging use cases in 5G. It is challenging to optimize the utilization of network infrastructures while guaranteeing the performance of network slices according to service level agreements (SLAs). To solve this problem, we propose SafeSlicing that introduces a new constraint-aware deep reinforcement learning (CaDRL) algorithm to learn the optimal resource orchestration policy within two steps, i.e., offline training in a simulated environment and online learning with the real network system. On optimizing the resource orchestration, we incorporate the constraints on the statistical performance of slices in the reward function using Lagrangian multipliers and solve the Lagrangian relaxed problem via a policy network. To satisfy the constraints on the system capacity, we design a constraint network to map the latent actions generated from the policy network to the orchestration actions such that the total resources allocated to network slices do not exceed the system capacity. We prototype SafeSlicing on an end-to-end testbed developed by using OpenAirInterface LTE, OpenDayLight-based SDN, and CUDA GPU computing platform. The experimental results show that SafeSlicing reduces more than 20% resource usage while meeting SLAs of network slices as compared with other solutions.more » « less
-
Traditional systems for allocating finite cluster resources among competing jobs have either aimed at providing fairness, relied on users to specify their resource requirements, or have estimated these requirements via surrogate metrics (e.g. CPU utilization). These approaches do not account for a job’s real world performance (e.g. P95 latency). Existing performance-aware systems use offline profiled data and/or are designed for specific allocation objectives. In this work, we argue that resource allocation systems should directly account for real-world performance and the varied allocation objectives of users. In this pursuit, we build Cilantro. At the core of Cilantro is an online learning mechanism which forms feedback loops with the jobs to estimate the resource to performance mappings and load shifts. This relieves users from the onerous task of job profiling and collects reliable real-time feedback. This is then used to achieve a variety of user-specified scheduling objectives. Cilantro handles the uncertainty in the learned models by adapting the underlying policy to work with confidence bounds. We demonstrate this in two settings. First, in a multi-tenant 1000 CPU cluster with 20 independent jobs, three of Cilantro’s policies outperform 9 other baselines on three different performance-aware scheduling objectives, improving user utilities by up to 1.2 − 3.7x. Second, in a microservices setting, where 160 CPUs must be distributed between 19 inter-dependent microservices, Cilantro outperforms 3 other baselines, reducing the end-to-end P99 latency to x0.57 the next best baseline.more » « less
-
Cloud computing has become an emerging trend for the software industry with the requirement of large infrastructure and resources. The future success of cloud computing depends on the effectiveness of instantiation of the infrastructure and utilization of available resources. Load Balancing ensures the fulfillment of these conditions to improve the cloud environment for the users. Load Balancing dynamically distributes the workload among the nodes in such a way that no single resource is either overwhelmed with tasks or underutilized. In this paper we propose a threshold based load balancing algorithm to ensure the equal distribution of the workload among the nodes. The main objective of the algorithms is to stop the VMs in the cloud being overloaded with tasks or being idle for lack allocation of tasks, when there are active tasks. We have simulated our proposed algorithm in the Cloudanalyst simulator with real world data scenarios. Simulation results shows that our proposed threshold based algorithm can provide a better response time for the task/requests and data processing time for the datacenters compared to the existing algorithms such as First Come First Serve (FCFS), Round Robin(RR) and Equally Spread Current Execution Load Balancing algorithm(ESCELB).more » « less