Resource flexing is the notion of allocating resources on-demand as workload changes.
This is a key advantage of Virtualized Network Functions (VNFs) over their
non-virtualized counterparts. However, it is difficult to balance the timeliness and
resource efficiency when making resource flexing decisions due to unpredictable workloads and complex VNF processing logic.
In this work, we propose an Elastic resource flexing system for Network functions
VIrtualization (ENVI) that leverages a combination of VNF-level features and
infrastructure-level features to construct a neural-network-based scaling decision
engine for generating timely scaling decisions. To adapt to dynamic workloads,
we design a window-based rewinding mechanism to update the neural network with
emerging workload patterns and make accurate decisions in real time. Our experimental
results for real VNFs (IDS Suricata and caching proxy Squid) using workloads generated
based on real-world traces, show that ENVI provisions significantly fewer (up to 26%)
resources without violating service level objectives, compared to commonly used
rule-based scaling policies.
more »
« less
RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization
Network function virtualization (NFV) technologyattracts tremendous interests from telecommunication industryand data center operators, as it allows service providers to assignresource for Virtual Network Functions (VNFs) on demand,achieving better flexibility, programmability, and scalability. Toimprove server utilization, one popular practice is to deploy besteffort (BE) workloads along with high priority (HP) VNFs whenhigh priority VNF’s resource usage is detected to be low. The keychallenge of this deployment scheme is to dynamically balancethe Service level objective (SLO) and the total cost of ownership(TCO) to optimize the data center efficiency under inherentlyfluctuating workloads. With the recent advancement in deepreinforcement learning, we conjecture that it has the potential tosolve this challenge by adaptively adjusting resource allocationto reach the improved performance and higher server utilization.In this paper, we present a closed-loop automation systemRLDRM1to dynamically adjust Last Level Cache allocationbetween HP VNFs and BE workloads using deep reinforcementlearning. The results demonstrate improved server utilizationwhile maintaining required SLO for the HP VNFs.
more »
« less
- Award ID(s):
- 1730628
- PAR ID:
- 10221240
- Date Published:
- Journal Name:
- 2020 6th IEEE International Conference on Network Softwarization (NetSoft)
- Page Range / eLocation ID:
- 335 to 343
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
It's not a Sprint, it's a Marathon: Stretching Multi-resource Burstable Performance in Public CloudsDuring the past few years, all leading cloud providers introduced burstable instances that can sprint their performance for a limited period to address sudden workload variations. Despite the availability of burstable instances, there is no clear understanding of how to minimize the waste of resources by regulating their burst capacity to the workload requirements. This is especially true when it comes to non-CPU-intensive applications. In this paper, we investigate how to limit network and I/O usage to optimize the efficiency of the bursting process. We also study which resource shall be controlled to benefit both cloud providers and end-users. We design MRburst (Multi-Resource burstable performance scheduler) to automatically limit multiple resources (i.e., network, I/O, and CPU) and make the application comply with a user-defined service level objective (SLO) while minimizing wasted resources. MRburst is evaluated on Amazon EC2 using two multi-resource applications: an FTP server and a Ceph system. Experimental results show that MRburst outperforms state-of-the-art approaches by allowing instances to speed up their performance for up to 2.4 times longer period while meeting SLO.more » « less
-
In this paper, we consider the challenges that arise from the need to scale virtualized network functions (VNFs) at 100 Gbps line speed and beyond. Traditional VNF designs are monolithic in state management and scheduling: internally maintaining all states and operations associated with them. Without proper design considerations, it suffers from limitations when scaling at 100 Gbps link speed and beyond: the inability of efficient utilization of the cache because of the contention due to the frequent control plane activities, computational/memory-intensive tasks taking up CPU times, shares states causing the synchronization among the cores. We address these limitations by arguing for the need to granularly decompose a VNF into data/control components that are co-located within a server but can be independently scaled among the cores. To realize the approach, we design a "serverless" programming framework with novel abstraction to optimize the data components that must process packets at the line speed, reduce the contention of the data states and enable run-time scheduling of different components for improved resource utilization. The abstractions, combined with the runtime system that we design, help NFV developers focus on the logic and correctness of VNF programming without worrying about how VNFs may be scaled in or out. We evaluate our platform by comparing it with monolithic approaches using different workloads and by analyzing its advantages of separation on scalability, performance determinism, and feature velocity.more » « less
-
The salient pay-per-use nature of serverless computing has driven its continuous penetration as an alternative computing paradigm for various workloads. Yet, challenges arise and remain open when shifting machine learning workloads to the serverless environment. Specifically, the restriction on the deployment size over serverless platforms combining with the complexity of neural network models makes it difficult to deploy large models in a single serverless function. In this paper, we aim to fully exploit the advantages of the serverless computing paradigm for machine learning workloads targeting at mitigating management and overall cost while meeting the response-time Service Level Objective (SLO). We design and implement AMPS-Inf, an autonomous framework customized for model inferencing in serverless computing. Driven by the cost-efficiency and timely-response, our proposed AMPS-Inf automatically generates the optimal execution and resource provisioning plans for inference workloads. The core of AMPS-Inf relies on the formulation and solution of a Mixed-Integer Quadratic Programming problem for model partitioning and resource provisioning with the objective of minimizing cost without violating response time SLO. We deploy AMPS-Inf on the AWS Lambda platform, evaluate with the state-of-the-art pre-trained models in Keras including ResNet50, Inception-V3 and Xception, and compare with Amazon SageMaker and three baselines. Experimental results demonstrate that AMPSInf achieves up to 98% cost saving without degrading response time performance.more » « less
-
A primary design objective for Data-intensive User- facing (DU) services for cloud and edge computing is to maximize query throughput, while meeting query tail latency Service Level Objectives (SLOs) for individual queries. Unfortunately, the existing solutions fall short of achieving this design objective, which we argue, is largely attributed to the fact that they fail to take the query fanout explicitly into account. In this paper, we propose TailGuard based on a Tail-latency-SLO-and- Fanout-aware Earliest-Deadline-First Queuing policy (TF-EDFQ) for task queuing at individual task servers the query tasks are fanned out to. With the task queuing deadline for each task being derived based on both query tail latency SLO and query fanout, TailGuard takes an important first step towards achieving the design objective. TailGuard is evaluated against First-In-First-Out (FIFO) task queuing, task PRIority Queuing (PRIQ) and Tail-latency-SLO-aware EDFQ (T-EDFQ) policies by simulation. It is driven by three types of applications in the Tailbench benchmark suite. The results demonstrate that TailGuard can improve resource utilization by up to 80%, while meeting the targeted tail latency SLOs, as compared with the other three policies. TailGuard is also implemented and tested in a highly heterogeneous Sensing-as-a-Service (SaS) testbed for a data sensing service, with test results in line with the other ones.more » « less