skip to main content


Title: Expanding across time to deliver bandwidth efficiency and low latency
Datacenters need networks that support both low-latency and high-bandwidth packet delivery to meet the stringent requirements of modern applications. We present Opera, a dynamic network that delivers latency-sensitive traffic quickly by relying on multi-hop forwarding in the same way as expander-graph-based approaches, but provides near-optimal bandwidth for bulk flows through direct forwarding over time-varying source-to-destination circuits. Unlike prior approaches, Opera requires no separate electrical network and no active circuit scheduling. The key to Opera's design is the rapid and deterministic reconfiguration of the network, piece-by-piece, such that at any moment in time the network implements an expander graph, yet, integrated across time, the network provides bandwidth-efficient single-hop paths between all racks. We show that Opera supports low-latency traffic with flow completion times comparable to cost-equivalent static topologies, while delivering up to 4x the bandwidth for all-to-all traffic and supporting up to 60% higher load for published datacenter workloads.  more » « less
Award ID(s):
1911104
NSF-PAR ID:
10181645
Author(s) / Creator(s):
Date Published:
Journal Name:
USENIX Symposium on Networked Systems Design and Implementation (NSDI)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Existing campus network infrastructure is not designed to effectively handle the transmission of big data sets. Performance degradation in these networks is often caused by middleboxes -- appliances that enforce campus-wide policies by deeply inspecting all traffic going through the network (including big data transmissions). We are developing a Software-Defined Networking (SDN) solution for our campus network that grants privilege to science flows by dynamically calculating routes that bypass certain middleboxes to avoid the bottlenecks they create. Using the global network information provided by an SDN controller, we are developing graph databases approaches to compute custom paths that not only bypass middleboxes to achieve certain requirements (e.g., latency, bandwidth, hop-count) but also insert rules that modify packets hop-by-hop to create the illusion of standard routing/forward despite the fact that packets are being rerouted. In some cases, additional functionality needs to be added to the path using network function virtualization (NFV) techniques (e.g., NAT). To ensure that path computations are run on an up-to-date snapshot of the topology, we introduce a versioning mechanism that allows for lazy topology updates that occur only when "important" network changes take place and are requested by big data flows. 
    more » « less
  2. Scale-out datacenter network fabrics enable network operators to translate improved link and switch speeds directly into end-host throughput. Unfortunately, limits in the underlying CMOS packet switch chip manufacturing roadmap mean that NICs, links, and switches are not getting faster fast enough to meet demand. As a result, operators have introduced alternative, parallel fabric designs in the core of the network that deliver N-times the bandwidth by simply forwarding traffic over any of N parallel network fabrics. In this work, we consider extending this parallel network idea all the way to the end host. Our initial impressions found that direct application of existing path selection and forwarding techniques resulted in poor performance. Instead, we show that appropriate path selection and forwarding protocols can not only improve the performance of existing, homogeneous parallel fabrics, but enable the development of heterogeneous parallel network fabrics that can deliver even higher bandwidth, lower latency, and improved resiliency than traditional designs constructed from the same constituent components. 
    more » « less
  3. Interactive mobile applications like web browsing and gaming are known to benefit significantly from low latency networking, as applications communicate with cloud servers and other users’ devices. Emerging mobile channel standards have not met these needs: general-purpose channels are greatly improving bandwidth but empirically offer little improvement for common latency-sensitive applications, and ultra-low-latency channels are targeted at only specific applications with very low bandwidth requirements. We explore a different direction for wireless channel design: utilizing two channels – one high bandwidth, one low latency – simultaneously for general-purpose applications. With a focus on web browsing, we design fine-grained traffic steering heuristics that can be implemented in a shim layer of the host network stack, effectively exploiting the high bandwidth and low latency properties of both channels. In the special case of 5G’s channels, our experiments show that even though URLLC offers just 0.2% of the bandwidth of eMBB, the use of both channels in parallel can reduce page load time by 26% to 59% compared to delivering traffic exclusively on eMBB. We believe this approach may benefit applications in addition to web browsing, may offer service providers incentives to deploy low latency channels, and suggests a direction for the design of future wireless channels. 
    more » « less
  4. The 5G user plane function (UPF) is a critical inter-connection point between the data network and cellular network infrastructure. It governs the packet processing performance of the 5G core network. UPFs also need to be flexible to support several key control plane operations. Existing UPFs typically run on general-purpose CPUs, but have limited performance because of the overheads of host-based forwarding. We design Synergy, a novel 5G UPF running on SmartNICs that provides high throughput and low latency. It also supports monitoring functionality to gather critical data on user sessions for the prediction and optimization of handovers during user mobility. The SmartNIC UPF efficiently buffers data packets during handover and paging events by using a two-level flow-state access mechanism. This enables maintaining flow-state for a very large number of flows, thus providing very low latency for control and data planes and high throughput packet forwarding. Mobility prediction can reduce the handover delay by pre-populating state in the UPF and other core NFs. Synergy performs handover predictions based on an existing recurrent neural network model. Synergy's mobility predictor helps us achieve 2.32× lower average handover latency. Buffering in the SmartNIC, rather than the host, during paging and handover events reduces packet loss rate by at least 2.04×. Compared to previous approaches to building programmable switch-based UPFs, Synergy speeds up control plane operations such as handovers because of the low P4-programming latency leveraging tight coupling between SmartNIC and host. 
    more » « less
  5. Network architects are frequently presented with a tradeoff: either (a) introduce a new or improved control-/management plane application that boosts overall performance, or (b) use the bandwidth it would have occupied to deliver user traffic. In this paper, we present OrbWeaver, a framework that can exploit unused network bandwidth for in-network coordination. Using real hardware, we demonstrate that OrbWeaver can harvest this bandwidth (1) with little-to-no impact on the bandwidth/latency of user packets and (2) while providing guarantees on the interarrival time of the injected traffic. Through an exploration of three example use cases, we show that this opportunistic coordination abstraction is sufficient to approximate recently proposed systems without any of their associated bandwidth overheads. 
    more » « less