skip to main content


Search for: All records

Creators/Authors contains: "Qi, Shixiong"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Cloud-native microservice applications use different communication paradigms to network microservices, including both synchronous and asynchronous I/O for exchanging data. Existing solutions depend on kernel-based networking, incurring significant overheads. The interdependence between microservices for these applications involves considerable communication, including contention between multiple concurrent flows or user sessions. In this paper, we design X-IO, a high-performance unified I/O interface that is built on top of shared memory processing with lock-free producer/consumer rings, eliminating kernel networking overheads and contention. X-IO offers a feature-rich interface. X-IO’s zero-copy interface supports building provides truly zero-copy data transfers between microservices, achieving high performance. X-IO also provides a POSIX-like socket interface using HTTP/REST API to achieve seamless porting of microservices to X-IO, without any change to the application code. X-IO supports concurrent connections for microservices that require distinct user sessions operating in parallel. Our preliminary experimental results show that X-IO’s zero-copy interfaces achieve 2.8x-4.1x performance improvement compared to kernel-based interfaces. Its socket interfaces outperform kernel TCP sockets and achieve performance close to UNIX-domain sockets. The HTTP/REST APIs in X-IO perform 1.4 x-2.3 x better than kernel-based alternatives with concurrent connections. 
    more » « less
  2. Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. 'Cold-start' latency is another deterrent. We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF's socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10× less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions 'warm' with negligible penalty. Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for 'cold-start'. 
    more » « less
  3. With the commercialization and deployment of 5G, efforts are beginning to explore the design of the next generation of cellular networks, called 6G. New and constantly evolving use cases continue to place performance demands, especially for low latency communications, as these are still challenges for the 3GPP-specified 5G design, and will have to be met by the 6G design. Therefore, it is helpful to re-examine several aspects of the current cellular network’s design and implementation.Based on our understanding of the 5G cellular network specifications, we explore different implementation options for a dis-aggregated 5G core and their performance implications. To improve the data plane performance, we consider advanced packet classification mechanisms to support fast packet processing in the User Plane Function (UPF), to improve the poor performance and scalability of the current design based on linked lists. Importantly, we implement the UPF function on a SmartNIC for forwarding and tunneling. The SmartNIC provides the fastpath for device traffic, while more complex functions of buffering and processing flows that suffer a miss on the SmartNIC P4 tables are processed by the host-based UPF. Compared to an efficient DPDK-based host UPF, the SmartNIC UPF increases the throughput for 64 Byte packets by almost 2×. Furthermore, we lower the packet forwarding latency by 3.75× by using the SmartNIC. In addition, we propose a novel context-level QoS mechanism that dynamically updates the Packet Detection Rule priority and resource allocation of a flow based on the user context. By combining our innovations, we can achieve low latency and high throughput that will help us evolve to the next generation 6G cellular networks. 
    more » « less
  4. Traditional network resident functions (e.g., firewalls, network address translation) and middleboxes (caches, load balancers) have moved from purpose-built appliances to software-based components. However, L2/L3 network functions (NFs) are being implemented on Network Function Virtualization (NFV) platforms that extensively exploit kernel-bypass technology. They often use DPDK for zero-copy delivery and high performance. On the other hand, L4/L7 middleboxes, which usually require full network protocol stack support, take advantage of a full-fledged kernel-based system with a greater emphasis on functionality. Thus, L2/L3 NFs and middleboxes continue to be handled by distinct platforms on different nodes.This paper proposes MiddleNet that seeks to overcome this dichotomy by developing a unified network resident function framework that supports L2/L3 NFs and L4/L7 middleboxes. MiddleNet supports function chains that are essential in both NFV and middlebox environments. MiddleNet uses DPDK for zero-copy packet delivery without interrupt-based processing, to enable the ‘bump-in-the-wire’ L2/L3 processing performance required of NFV. To support L4/L7 middlebox functionality, MiddleNet utilizes a consolidated, kernel-based protocol stack processing, avoiding a dedicated protocol stack for each function. MiddleNet fully exploits the event-driven capabilities provided by the extended Berkeley Packet Filter (eBPF) and seamlessly integrates it with shared memory for high-performance communication in L4/L7 middlebox function chains. The overheads for MiddleNet are strictly load-proportional, without needing the dedicated CPU cores of DPDK-based approaches. MiddleNet supports flow-dependent packet processing by leveraging Single Root I/O Virtualization (SR-IOV) to dynamically select packet processing needed (Layer 2 to Layer 7). Our experimental results show that MiddleNet can achieve high performance in such a unified environment. 
    more » « less
  5. Cellular network control procedures (e.g., mobility, idle-active transition to conserve energy) directly influence data plane behavior, impacting user-experienced delay. Recognizing this control-data plane interdependence, L25GC re-architects the 5G Core (5GC) network, and its processing, to reduce latency of control plane operations and their impact on the data plane. Exploiting shared memory, L25GC eliminates message serialization and HTTP processing overheads, while being 3GPP-standards compliant. We improve data plane processing by factoring the functions to avoid control-data plane interference, and using scalable, flow-level packet classifiers for forwarding-rule lookups. Utilizing buffers at the 5GC, L25GC implements paging, and an intelligent handover scheme avoiding 3GPP's hairpin routing, and data loss caused by limited buffering at 5G base stations, reduces delay and unnecessary message processing. L25GC's integrated failure resiliency transparently recovers from failures of 5GC software network functions and hardware much faster than 3GPP's reattach recovery procedure. L25GC is built based on free5GC, an open-source kernel-based 5GC implementation. L25GC reduces event completion time by ~50% for several control plane events and improves data packet latency (due to improved control plane communication) by ~2×, during paging and handover events, compared to free5GC. L25GC's design is general, although current implementation supports a limited number of user sessions. 
    more » « less
  6. Edge cloud solutions that bring the cloud closer to the sensors can be very useful to meet the low latency requirements of many Internet-of-Things (IoT) applications. However, IoT traffic can also be intermittent, so running applications constantly can be wasteful. Therefore, having a serverless edge cloud that is responsive and provides low-latency features is a very attractive option for a resource and cost-efficient IoT application environment.In this paper, we discuss the key components needed to support IoT traffic in the serverless edge cloud and identify the critical challenges that make it difficult to directly use existing serverless solutions such as Knative, for IoT applications. These include overhead from heavyweight components for managing the overall system and software adaptors for communication protocol translation used in off-the-shelf serverless platforms that are designed for large-scale centralized clouds. The latency imposed by ‘cold start’ is a further deterrent.To address these challenges we redesign several components of the Knative serverless framework. We use a streamlined protocol adaptor to leverage the MQTT IoT protocol in our serverless framework for IoT event processing. We also create a novel, event-driven proxy based on the extended Berkeley Packet Filter (eBPF), to replace the regular heavyweight Knative queue proxy. Our preliminary experimental results show that the event-driven proxy is a suitable replacement for the queue proxy in an IoT serverless environment and results in lower CPU usage and a higher request throughput. 
    more » « less
  7. null (Ed.)
    This paper focuses on the need for emerging domains such as serverless and in-network computing, where applications are often hosted on virtualized compute instances (e.g., containers and unikernels), to have applications startup as quickly as possible. We provide a qualitative and quantitative analysis of containers and unikernels with regard to the startup time. We analyze these in-depth and identify the key components and their impact under scale on the startup latency. We study how startup time scales as we launch multiple instances concurrently. We study the contribution of popular Container Networking Interfaces (CNIs), to the startup time. 
    more » « less
  8. Serverless computing platforms simplify development, deployment, and automated management of modular software functions. However, existing serverless platforms typically assume an over-provisioned cloud, making them a poor fit for Edge Computing environments where resources are scarce. In this paper we propose a redesigned serverless platform that comprehensively tackles the key challenges for serverless functions in a resource constrained Edge Cloud. Our Mu platform cleanly integrates the core resource management components of a serverless platform: autoscaling, load balancing, and placement. Each worker node in Mu transparently propagates metrics such as service rate and queue length in response headers, feeding this information to the load balancing system so that it can better route requests, and to our autoscaler to anticipate workload fluctuations and proactively meet SLOs. Data from the Autoscaler is then used by the placement engine to account for heterogeneity and fairness across competing functions, ensuring overall resource efficiency, and minimizing resource fragmentation. We implement our design as a set of extensions to the Knative serverless platform and demonstrate its improvements in terms of resource efficiency, fairness, and response time. Evaluating Mu, shows that it improves fairness by more than 2x over the default Kubernetes placement engine, improves 99th percentile response times by 62% through better load balancing, reduces SLO violations and resource consumption by pro-active and precise autoscaling. Mu reduces the average number of pods required by more than ~15% for a set of real Azure workloads. 
    more » « less
  9. null (Ed.)