Ensuring high scalability (elastic scale-out and consolidation), as well as high availability (failure resiliency) are critical in encouraging adoption of software-based network functions (NFs). In recent years, two paradigms have evolved in terms of the way the NFs manage their state - namely the Stateful (state is coupled with the NF instance) and a Stateless (state is externalized to a datastore) manner. These two paradigms present unique challenges and opportunities for ensuring high scalability and high availability of NFs and NF chains. In this work, we assess the impact on ensuring the correctness of NF state including the implications of non-determinism in packet processing, and carefully analyze and present the benefits and disadvantages of the two state management paradigms. We leverage OpenNetVM and Redis in-memory datastore to implement both state management paradigms and empirically compare the two. Although the stateless paradigm is desirable for elastic scaling, our experimental results show that, even at line-rate packet processing (10 Gbps), stateful NFs can achieve chain-level failover across servers in a LAN incurring less than 10% performance. The state-of-the-art stateless counterparts incur severe throughput penalties. We observe 30-85% overhead on normal processing, depending on the mode of state updated to the externalized datastore.
more »
« less
Contention-Aware Performance Prediction For Virtualized Network Functions
At the core of Network Functions Virtualization lie Network Functions (NFs) that run co-resident on the same server, contend over its hardware resources and, thus, might suffer from reduced performance relative to running alone on the same hardware. Therefore, to efficiently manage resources and meet performance SLAs, NFV orchestrators need mechanisms to predict contention-induced performance degradation. In this work, we find that prior performance prediction frameworks suffer from poor accuracy on modern architectures and NFs because they treat memory as a monolithic whole. In addition, we show that, in practice, there exist multiple components of the memory subsystem that can separately induce contention. By precisely characterizing (1) the pressure each NF applies on the server's shared hardware resources (contentiousness) and (2) how susceptible each NF is to performance drop due to competing contentiousness (sensitivity), we develop SLOMO, a multivariable performance prediction framework for Network Functions. We show that relative to prior work SLOMO reduces prediction error by 2-5x and enables 6-14% more efficient cluster utilization. SLOMO's codebase can be found at https://github.com/cmu-snap/SLOMO.
more »
« less
- Award ID(s):
- 1700521
- PAR ID:
- 10180451
- Date Published:
- Journal Name:
- SIGCOMM '20: Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication
- Page Range / eLocation ID:
- 270 to 282
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
REINFORCE: achieving efficient failure resiliency for network function virtualization based servicesEnsuring high availability (HA) for software-based networks is a critical design feature that will help the adoption of software-based network functions (NFs) in production networks. It is important for NFs to avoid outages and maintain mission-critical operations. However, HA support for NFs on the critical data path can result in unacceptable performance degradation. We present REINFORCE, an integrated framework to support efficient resiliency for NFs and NF service chains. REINFORCE includes timely failure detection and consistent failover mechanisms. REINFORCE replicates state to standby NFs (local and remote) while enforcing correctness. It minimizes the number of state transfers by exploiting the concept of external synchrony, and leverages opportunistic batching and multi-buffering to optimize performance. Experimental results show that, even at line-rate packet processing (10 Gbps), REINFORCE achieves chain-level failover across servers in a LAN (or within the same node) within 10ms (100/μs), incurring less than 10% (1%) performance overhead, and adds average latency of only ~400/μs (5/μs), with a worst-case latency of less than 1ms (10/μs).more » « less
-
Elastic scaling is a central promise of NFV but has been hard to realize in practice. The difficulty arises because most Network Functions (NFs) are stateful and this state need to be shared across NF instances. Implementing state sharing while meeting the throughput and latency requirements placed on NFs is challenging and, to date, no solution exists that meets NFV’s performance goals for the full spectrum of NFs. S6 is a new framework that supports elastic scaling of NFs without compromising performance. Its design builds on the insight that a distributed shared state abstraction is well-suited to the NFV context. We organize state as a distributed shared object (DSO) space and extend the DSO concept with techniques designed to meet the need for elasticity and high-performance in NFV workloads. S6 simplifies development: NF writers program with no awareness of how state is distributed and shared. Instead, S6 transparently migrates state and handles accesses to shared state. In our evaluation, compared to recent solutions for dynamic scaling of NFs, S6 improves performance by 100x during scaling events [25], and by 2-5x under normal operationmore » « less
-
5G technology transitions the cellular network core from specialized hardware into software-based cloud-native network functions (NFs). As part of this change, the 3GPP defines an access control policy to protect NFs from one another and third-party network applications. A manual review of this policy by the 3GPP identified an over-privilege flaw that exposes cryptographic keys to all NFs. Unfortunately, such a manual review is difficult due to ambiguous documentation. In this paper, we use static program analysis to extract NF functionality from four 5G core implementations and compare that functionality to what is permissible by the 3GPP policy. We discover two previously unknown instances of over-privilege that can lead denial-of-service and extract sensitive data. We have reported our findings to the GSMA, who has confirmed the significance of these policy flaws.more » « less
-
Hardware memory disaggregation is an emerging trend in datacenters that provides access to remote memory as part of a shared pool or unused memory on machines across the network. Memory disaggregation aims to improve memory utilization and scale memory-intensive applications. Current state-of-the-art prototypes have shown that hardware disaggregated memory is a reality at the rack-scale. However, the memory utilization benefits of memory disaggregation can only be fully realized at larger scales enabled by a datacenter-wide network. Introduction of a datacenter network results in new performance and reliability failures that may manifest as higher network latency. Additionally, sharing of the network introduces new points of contention between multiple applications. In this work, we characterize the impact of variable network latency and contention in an open-source hardware disaggregated memory prototype - ThymesisFlow. To support our characterization, we have developed a delay injection framework that introduces delays in remote memory access to emulate network latency. Based on the characterization results, we develop insights into how reliability and resource allocation mechanisms should evolve to support hardware memory disaggregation beyond rack-scale in datacenters.more » « less
An official website of the United States government

