NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SURE: Secure Unikernels Make Serverless Computing Rapid and Efficient

https://doi.org/10.1145/3698038.3698558

Parola, Federico; Qi, Shixiong; Narappa, Anvaya B; Ramakrishnan, K K; Risso, Fulvio (November 2024, ACM SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing)

Free, publicly-accessible full text available November 20, 2025
D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

https://doi.org/10.1109/TCC.2024.3476210

Dhakal, Aditya; Kulkarni, Sameer G; Ramakrishnan, K K (October 2024, IEEE Transactions on Cloud Computing)

Full Text Available
Z-Stack: A High-Performance DPDK-Based Zero-Copy TCP/IP Protocol Stack

https://doi.org/10.1109/LANMAN61958.2024.10621881

Narappa, Anvaya B; Parola, Federico; Qi, Shixiong; Ramakrishnan, K K (July 2024, IEEE)

Data centers require high-performance and efficient networking for fast and reliable communication between applications. TCP/IP-based networking still plays a dominant role in data center networking to support a wide range of Layer-4 and Layer-7 applications, such as middleboxes and cloud-based microservices. However, traditional kernel-based TCP/IP stacks face performance challenges due to overheads such as context switching, interrupts, and copying. We present Z-stack, a high-performance userspace TCP/IP stack with a zero-copy design. Utilizing DPDK's Poll Mode Driver, Z-stack bypasses the kernel and moves packets between the NIC and the protocol stack in userspace, eliminating the overhead associated with kernel-based processing. Z-stack em-ploys polling-based packet processing that improves performance under high loads, and eliminates receive livelocks compared to interrupt-driven packet processing. With its zero-copy socket design, Z-stack eliminates copies when moving data between the user application and the protocol stack, which further minimizes latency and improves throughput. In addition, Z-stack seamlessly integrates with shared memory processing within the node, eliminating duplicate protocol processing and serializationldese-rialization overheads for intra-node communication. Z-stack uses F-stack as the starting point which integrates the proven TCP/IP stack from FreeBSD, providing a versatile solution for a variety of cloud use cases and improving performance of data center networking.
more » « less
Full Text Available
SPRIGHT: High-Performance eBPF-Based Event-Driven, Shared-Memory Processing for Serverless Computing

https://doi.org/10.1109/TNET.2024.3366561

Qi, Shixiong; Monis, Leslie; Zeng, Ziteng; Wang, Ian-Chin; Ramakrishnan, K K (June 2024, IEEE/ACM Transactions on Networking)

Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. ‘Cold-start’ latency is another deterrent. We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF’s socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10× less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions ‘warm’ with negligible penalty. Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for ‘cold-start’.
more » « less
Full Text Available
LIFL: A Lightweight, Event-driven Serverless Platform for Federated Learning

Qi, Shixiong; Ramakrishnan, K K; Lee, Myungjin (May 2024, Proceedings of Machine Learning and Systems 6 (MLSys 2024) Conference)

Federated Learning (FL) typically involves a large-scale, distributed system with individual user devices/servers training models locally and then aggregating their model updates on a trusted central server. Existing systems for FL often use an always-on server for model aggregation, which can be inefficient in terms of resource utilization. They also may be inelastic in their resource management. This is particularly exacerbated when aggregating model updates at scale in a highly dynamic environment with varying numbers of heterogeneous user devices/servers. We present LIFL, a lightweight and elastic serverless cloud platform with fine-grained resource management for efficient FL aggregation at scale. LIFL is enhanced by a streamlined, event-driven serverless design that eliminates the individual, heavyweight message broker and replaces inefficient container-based sidecars with lightweight eBPF-based proxies. We leverage shared memory processing to achieve high-performance communication for hierarchical aggregation, which is commonly adopted to speed up FL aggregation at scale. We further introduce the locality-aware placement in LIFL to maximize the benefits of shared memory processing. LIFL precisely scales and carefully reuses the resources for hierarchical aggregation to achieve the highest degree of parallelism, while minimizing aggregation time and resource consumption. Our preliminary experimental results show that LIFL achieves significant improvement in resource efficiency and aggregation speed for supporting FL at scale, compared to existing serverful and serverless FL systems.
more » « less
Full Text Available
L26GC: Evolving the Low-Latency Core for Future Cellular Networks

https://doi.org/10.1109/MIC.2024.3376655

Qi, Shixiong; Ramakrishnan, K K; Chen, Jyh-Cheng (March 2024, IEEE Internet Computing)

Full Text Available
MiddleNet: A Unified, High-Performance NFV and Middlebox Framework With eBPF and DPDK

https://doi.org/10.1109/TNSM.2023.3256891

Qi, Shixiong; Zeng, Ziteng; Monis, Leslie; Ramakrishnan, K. K. (December 2023, IEEE Transactions on Network and Service Management)
L25GC+: An Improved, 3GPP-compliant 5G Core for Low-latency Control Plane Operations

https://doi.org/10.1109/CloudNet59005.2023.10490024

Liu, Yu-Sheng; Qi, Shixiong; Lin, Po-Yi; Tsai, Han-Sing; Ramakrishnan, K K; Chen, Jyh-Cheng (November 2023, 2023 IEEE 12th International Conference on Cloud Networking (CloudNet))

While 5G offers fast access networks and a high-performance data plane, the control plane in 5G core (5GC) still presents challenges due to inefficiencies in handling control plane operations (including session establishment, handovers and idle-to-active state-transitions) of 5G User Equipment (UE). The Service-based Interface (SBI) used for communication between 5G control plane functions introduces substantial overheads that impact latency. Typical 5GCs are supported in the cloud on containers, to support the disaggregated Control and User Plane Separation (CUPS) framework of 3GPP. L25GC is a state-of-the-art 5G control plane design utilizing shared memory processing to reduce the control plane latency. However, L25GC has limitations in supporting multiple user sessions and has programming language incompatibilities with 5GC implementations, e.g., free5GC, using modern languages such as GoLang. To address these challenges, we develop L25GC+, a significant enhancement to L25GC. L25GC+ re-designs the shared-memory-based networking stack to support synchronous I/O between control plane functions. L25GC+ distinguishes different user sessions and maintains strict 3GPP compliance. L25GC+ also offers seamless integration with existing 5GC microservice implementations through equivalent SBI APIs, reducing code refactoring and porting efforts. By leveraging shared memory I/O and overcoming L25GC’s limitations, L25GC+ provides an improved solution to optimize the 5G control plane, enhancing latency, scalability, and overall user experience. We demonstrate the improved performance of L25GC+ on a 5G testbed with commercial basestations and multiple UEs.
more » « less
Full Text Available
X-IO: A High-performance Unified I/O Interface using Lock-free Shared Memory Processing

https://doi.org/10.1109/NetSoft57336.2023.10175428

Qi, Shixiong; Tsai, Han-Sing; Liu, Yu-Sheng; Ramakrishnan, K. K.; Chen, Jyh-Cheng (June 2023, 2023 IEEE 9th International Conference on Network Softwarization (NetSoft))

Cloud-native microservice applications use different communication paradigms to network microservices, including both synchronous and asynchronous I/O for exchanging data. Existing solutions depend on kernel-based networking, incurring significant overheads. The interdependence between microservices for these applications involves considerable communication, including contention between multiple concurrent flows or user sessions. In this paper, we design X-IO, a high-performance unified I/O interface that is built on top of shared memory processing with lock-free producer/consumer rings, eliminating kernel networking overheads and contention. X-IO offers a feature-rich interface. X-IO’s zero-copy interface supports building provides truly zero-copy data transfers between microservices, achieving high performance. X-IO also provides a POSIX-like socket interface using HTTP/REST API to achieve seamless porting of microservices to X-IO, without any change to the application code. X-IO supports concurrent connections for microservices that require distinct user sessions operating in parallel. Our preliminary experimental results show that X-IO’s zero-copy interfaces achieve 2.8x-4.1x performance improvement compared to kernel-based interfaces. Its socket interfaces outperform kernel TCP sockets and achieve performance close to UNIX-domain sockets. The HTTP/REST APIs in X-IO perform 1.4 x-2.3 x better than kernel-based alternatives with concurrent connections.
more » « less
Synergy: A SmartNIC Accelerated 5G Dataplane and Monitor for Mobility Prediction

https://doi.org/10.1109/ICNP55882.2022.9940261

Panda, Sourav; Ramakrishnan, K. K.; Bhuyan, Laxmi N. (October 2022, 2022 IEEE 30th International Conference on Network Protocols (ICNP))

The 5G user plane function (UPF) is a critical inter-connection point between the data network and cellular network infrastructure. It governs the packet processing performance of the 5G core network. UPFs also need to be flexible to support several key control plane operations. Existing UPFs typically run on general-purpose CPUs, but have limited performance because of the overheads of host-based forwarding. We design Synergy, a novel 5G UPF running on SmartNICs that provides high throughput and low latency. It also supports monitoring functionality to gather critical data on user sessions for the prediction and optimization of handovers during user mobility. The SmartNIC UPF efficiently buffers data packets during handover and paging events by using a two-level flow-state access mechanism. This enables maintaining flow-state for a very large number of flows, thus providing very low latency for control and data planes and high throughput packet forwarding. Mobility prediction can reduce the handover delay by pre-populating state in the UPF and other core NFs. Synergy performs handover predictions based on an existing recurrent neural network model. Synergy's mobility predictor helps us achieve 2.32× lower average handover latency. Buffering in the SmartNIC, rather than the host, during paging and handover events reduces packet loss rate by at least 2.04×. Compared to previous approaches to building programmable switch-based UPFs, Synergy speeds up control plane operations such as handovers because of the low P4-programming latency leveraging tight coupling between SmartNIC and host.
more » « less
Full Text Available

« Prev Next »

Search for: All records