skip to main content


Title: Parallelizing packet processing in container overlay networks
Container networking, which provides connectivity among containers on multiple hosts, is crucial to building and scaling container-based microservices. While overlay networks are widely adopted in production systems, they cause significant performance degradation in both throughput and latency compared to physical networks. This paper seeks to understand the bottlenecks of in-kernel networking when running container overlay networks. Through profiling and code analysis, we find that a prolonged data path, due to packet transformation in overlay networks, is the culprit of performance loss. Furthermore, existing scaling techniques in the Linux network stack are ineffective for parallelizing the prolonged data path of a single network flow. We propose FALCON, a fast and balanced container networking approach to scale the packet processing pipeline in overlay networks. FALCON pipelines software interrupts associated with different network devices of a single flow on multiple cores, thereby preventing execution serialization of excessive software interrupts from overloading a single core. FALCON further supports multiple network flows by effectively multiplexing and balancing software interrupts of different flows among available cores. We have developed a prototype of FALCON in Linux. Our evaluation with both micro-benchmarks and real-world applications demonstrates the effectiveness of FALCON, with significantly improved performance (by 300% for web serving) and reduced tail latency (by 53% for data caching).  more » « less
Award ID(s):
1909877
NSF-PAR ID:
10297232
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
EuroSys 2021
Page Range / eLocation ID:
261 to 276
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Overlay networks serve as the de facto network virtualization technique for providing connectivity among distributed containers. Despite the flexibility in building customized private container networks, overlay networks incur significant performance loss compared to physical networks (i.e., the native). The culprit lies in the inclusion of multiple network processing stages in overlay networks, which prolongs the network processing path and overloads CPU cores. In this paper, we propose mFlow, a novel packet steering approach to parallelize the in-kernel data path of network flows. mFlow exploits packet-level parallelism in the kernel network stack by splitting the packets of the same flow into multiple micro-flows, which can be processed in parallel on multiple cores. mFlow devises new, generic mechanisms for flow splitting while preserving in-order packet delivery with little overhead. Our evaluation with both micro-benchmarks and real-world applications demonstrates the effectiveness of mFlow, with significantly improved performance – e.g., by 81% in TCP throughput and 139% in UDP compared to vanilla overlay networks. mFlow even achieved higher TCP throughput than the native (e.g., 29.8 vs. 26.6 Gbps). 
    more » « less
  2. Advanced high-speed network cards have made packet processing in host operating systems a major performance bottleneck. The kernel network stack gives rise to various sources of overheads that limit the throughput and lengthen the per-packet processing latency. The problem is further exacerbated for short-lived, latency-sensitive network flows such as control packets, online gaming, database requests, etc. — in a highly utilized system, especially in virtualized (containerized) cloud environments, short flows can experience excessively long in-kernel queuing delays. As a consequence, recent research works propose to bypass the kernel network stack to enable lightweight, custom userspace network stacks for improved performance, but at a heavy cost of compatibility and security. In this paper, we take a different approach: We first analyze various sources of inefficiencies in the kernel network stack and propose ways to mitigate them without compromising systems compatibility, security, or flexibility. Further, we propose PRISM, a novel mechanism in the kernel network stack to differentiate incoming packets based on their performance requirements and streamline the processing stages of multi-stage packet processing pipelines (e.g., in container overlay networks). Our evaluation demonstrates that PRISM can significantly improve the latency of high-priority flows in container overly networks in the presence of heavy low-priority background traffic. 
    more » « less
  3. Wide Area Measurement Systems (WAMS) use an underlying communication network to collect and analyze data from devices in the power grid, aimed to improve grid operations. For WAMS to be effective, the communication network needs to support low packet latency and low packet losses. Internet Protocol (IP), the pervasive technology used in today’s communication networks uses loop-free best-paths for data forwarding, which increases the load on these paths causing delays and losses in delivery. Information-Centric Networking (ICN), a new networking paradigm, designed to enable a data-centric information sharing, natively supports the concurrent use of multiple transmission interfaces, in-networking caching, as well as per-packet security and can provide better application support. In this paper, we present , an ICN-based network architecture for wide area smart grid communications. We demonstrate through simulations that achieves low latency and 100% data delivery even during network congestion by leveraging multiple available paths; thus significantly improving communication resiliency in comparison to an IP-based approach. can be used immediately on today’s Internet as an overlay. 
    more » « less
  4. P4’s data-plane programmability allows for highly customizable and programmable packet processing, enabling rapid innovation in network applications, such as virtualization, security, load balancing, and traffic engineering. Researchers extensively use Mininet, a popular network emulator, integrated with BMv2, for fast and flexible prototyping of these P4-based applications, but due to its lower performance in terms of throughput and latency compared to a production-grade software switch like Open vSwitch, it is crucial to have an accurate and scalable emulation testbed. In this paper, we develop a lightweight virtual time system and integrate it into Mininet with BMv2 to enhance fidelity and scalability. By scaling the time of interactions between containers and the underlying physical machine by a time dilation factor (TDF), we can trade time with system resources, making the emulated P4 network appear to be faster from the viewpoint of the switch/host processes in the container. Our experimental results show that the testbed can accurately emulate much larger networks with high loads, scaled by a factor of TDF with extremely low system overhead. 
    more » « less
  5. Existing campus network infrastructure is not designed to effectively handle the transmission of big data sets. Performance degradation in these networks is often caused by middleboxes -- appliances that enforce campus-wide policies by deeply inspecting all traffic going through the network (including big data transmissions). We are developing a Software-Defined Networking (SDN) solution for our campus network that grants privilege to science flows by dynamically calculating routes that bypass certain middleboxes to avoid the bottlenecks they create. Using the global network information provided by an SDN controller, we are developing graph databases approaches to compute custom paths that not only bypass middleboxes to achieve certain requirements (e.g., latency, bandwidth, hop-count) but also insert rules that modify packets hop-by-hop to create the illusion of standard routing/forward despite the fact that packets are being rerouted. In some cases, additional functionality needs to be added to the path using network function virtualization (NFV) techniques (e.g., NAT). To ensure that path computations are run on an up-to-date snapshot of the topology, we introduce a versioning mechanism that allows for lazy topology updates that occur only when "important" network changes take place and are requested by big data flows. 
    more » « less