skip to main content

This content will become publicly available on July 6, 2023

Title: Evolving to 6G: Improving the Cellular Core to lower control and data plane latency
With the commercialization and deployment of 5G, efforts are beginning to explore the design of the next generation of cellular networks, called 6G. New and constantly evolving use cases continue to place performance demands, especially for low latency communications, as these are still challenges for the 3GPP-specified 5G design, and will have to be met by the 6G design. Therefore, it is helpful to re-examine several aspects of the current cellular network’s design and implementation.Based on our understanding of the 5G cellular network specifications, we explore different implementation options for a dis-aggregated 5G core and their performance implications. To improve the data plane performance, we consider advanced packet classification mechanisms to support fast packet processing in the User Plane Function (UPF), to improve the poor performance and scalability of the current design based on linked lists. Importantly, we implement the UPF function on a SmartNIC for forwarding and tunneling. The SmartNIC provides the fastpath for device traffic, while more complex functions of buffering and processing flows that suffer a miss on the SmartNIC P4 tables are processed by the host-based UPF. Compared to an efficient DPDK-based host UPF, the SmartNIC UPF increases the throughput for 64 Byte packets by almost more » 2×. Furthermore, we lower the packet forwarding latency by 3.75× by using the SmartNIC. In addition, we propose a novel context-level QoS mechanism that dynamically updates the Packet Detection Rule priority and resource allocation of a flow based on the user context. By combining our innovations, we can achieve low latency and high throughput that will help us evolve to the next generation 6G cellular networks. « less
; ; ;
Award ID(s):
Publication Date:
Journal Name:
1st International Conference on 6G Networking (6GNet), 2022
Page Range or eLocation-ID:
1 to 8
Sponsoring Org:
National Science Foundation
More Like this
  1. Cellular network control procedures (e.g., mobility, idle-active transition to conserve energy) directly influence data plane behavior, impacting user-experienced delay. Recognizing this control-data plane interdependence, L25GC re-architects the 5G Core (5GC) network, and its processing, to reduce latency of control plane operations and their impact on the data plane. Exploiting shared memory, L25GC eliminates message serialization and HTTP processing overheads, while being 3GPP-standards compliant. We improve data plane processing by factoring the functions to avoid control-data plane interference, and using scalable, flow-level packet classifiers for forwarding-rule lookups. Utilizing buffers at the 5GC, L25GC implements paging, and an intelligent handover scheme avoiding 3GPP's hairpin routing, and data loss caused by limited buffering at 5G base stations, reduces delay and unnecessary message processing. L25GC's integrated failure resiliency transparently recovers from failures of 5GC software network functions and hardware much faster than 3GPP's reattach recovery procedure. L25GC is built based on free5GC, an open-source kernel-based 5GC implementation. L25GC reduces event completion time by ~50% for several control plane events and improves data packet latency (due to improved control plane communication) by ~2×, during paging and handover events, compared to free5GC. L25GC's design is general, although current implementation supports a limited number of user sessions.
  2. Despite advances in network security, attacks targeting mission critical systems and applications remain a significant problem for network and datacenter providers. Existing telemetry platforms detect volumetric attacks at terabit scales using approximation techniques and coarse grain analysis. However, the prevalence of low and slow attacks that require very little bandwidth, makes flow-state tracking critical to overall attack mitigation. Traffic queries deployed on network switches are often limited by hardware constraints, preventing them from carrying out flow tracking features required to detect stealthy attacks. Such attacks can go undetected in the midst of high traffic volumes. We design SmartWatch, a novel flow state tracking and flow logging system at line rate, using SmartNICs to optimize performance and simultaneously detect a number of stealthy attacks. SmartWatch leverages advances in switch based network telemetry platforms to process the bulk of the traffic and only forward suspicious traffic subsets to the SmartNIC. The programmable network switches perform coarse-grained traffic analysis while the SmartNIC conducts the finer-grained analysis which involves additional processing of the packet as a 'bump-in-the-wire'. A control loop between the SmartNIC and programmable switch tunes the queries performed in the switch to direct the most appropriate traffic subset to the SmartNIC. SmartWatch'smore »cooperative monitoring approach yields 2.39 times better detection rate compared to existing platforms deployed on programmable switches. SmartWatch can detect covert timing channels and perform website fingerprinting more efficiently compared to standalone programmable switch solutions, relieving switch memory and control-plane processor resources. Compared to host-based approaches, SmartWatch can reduce the packet processing latency by 72.32%.« less
  3. An optical circuit-switched network core has the potential to overcome the inherent challenges of a conventional electrical packet-switched core of today's compute clusters. As optical circuit switches (OCS) directly handle the photon beams without any optical-electrical-optical (O/E/O) conversion and packet processing, OCS-based network cores have the following desirable properties: a) agnostic to data-rate, b) negligible/zero power consumption, c) no need of transceivers, d) negligible forwarding latency, and e) no need for frequent upgrade. Unfortunately, OCS can only provide point-to-point (unicast) circuits. They do not have built-in support for one-to-many (multicast) communication, yet multicast is fundamental to a plethora of data-intensive applications running on compute clusters nowadays. In this paper, we propose Shufflecast, a novel optical network architecture for next-generation compute clusters that can support high-performance multicast satisfying all the properties of an OCS-based network core. Shufflecast leverages small fanout, inexpensive, passive optical splitters to connect the Top-of-rack (ToR) switch ports, ensuring data-rate agnostic, low-power, physical-layer multicast. We thoroughly analyze Shufflecast's highly scalable data plane, light-weight control plane, and graceful failure handling. Further, we implement a complete prototype of Shufflecast in our testbed and extensively evaluate the network. Shufflecast is more power-efficient than the state-of-the-art multicast mechanisms. Also, Shufflecast is moremore »cost-efficient than a conventional packet-switched network. By adding Shufflecast alongside an OCS-based unicast network, an all-optical network core with the aforementioned desirable properties supporting both unicast and multicast can be realized.« less
  4. Overcoming the conventional trade-off between throughput and bit error rate (BER) performance, versus computational complexity is a long-term challenge for uplink Multiple-Input Multiple-Output (MIMO) detection in base station design for the cellular 5G New Radio roadmap, as well as in next generation wireless local area networks. In this work, we present ParaMax, a MIMO detector architecture that for the first time brings to bear physics-inspired parallel tempering algorithmic techniques [28, 50, 67] on this class of problems. ParaMax can achieve near optimal maximum-likelihood (ML) throughput performance in the Large MIMO regime, Massive MIMO systems where the base station has additional RF chains, to approach the number of base station antennas, in order to support even more parallel spatial streams. ParaMax is able to achieve a near ML-BER performance up to 160 × 160 and 80 × 80 Large MIMO for low-order modulations such as BPSK and QPSK, respectively, only requiring less than tens of processing elements. With respect to Massive MIMO systems, in 12 × 24 MIMO with 16-QAM at SNR 16 dB, ParaMax achieves 330 Mbits/s near-optimal system throughput with 4--8 processing elements per subcarrier, which is approximately 1.4× throughput than linear detector-based Massive MIMO systems.
  5. Advanced high-speed network cards have made packet processing in host operating systems a major performance bottleneck. The kernel network stack gives rise to various sources of overheads that limit the throughput and lengthen the per-packet processing latency. The problem is further exacerbated for short-lived, latency-sensitive network flows such as control packets, online gaming, database requests, etc. — in a highly utilized system, especially in virtualized (containerized) cloud environments, short flows can experience excessively long in-kernel queuing delays. As a consequence, recent research works propose to bypass the kernel network stack to enable lightweight, custom userspace network stacks for improved performance, but at a heavy cost of compatibility and security. In this paper, we take a different approach: We first analyze various sources of inefficiencies in the kernel network stack and propose ways to mitigate them without compromising systems compatibility, security, or flexibility. Further, we propose PRISM, a novel mechanism in the kernel network stack to differentiate incoming packets based on their performance requirements and streamline the processing stages of multi-stage packet processing pipelines (e.g., in container overlay networks). Our evaluation demonstrates that PRISM can significantly improve the latency of high-priority flows in container overly networks in the presence of heavymore »low-priority background traffic.« less