skip to main content


Title: CodedBulk: Inter-Datacenter Bulk Transfers using Network Coding
This paper presents CodedBulk, a system for high-throughput inter-datacenter bulk transfers. At its core, CodedBulk uses network coding, a technique from the coding theory community, that guarantees optimal throughput for individual bulk transfers. Prior attempts to using network coding in wired networks have faced several pragmatic and fundamental barriers. CodedBulk resolves these barriers by exploiting the unique properties of inter-datacenter networks, and by using a custom-designed hop-by-hop flow control mechanism that enables efficient realization of network coding atop existing transport protocols. An end-to end CodedBulk implementation running on a geo-distributed inter-datacenter network improves bulk transfer throughput by 1.2−2.5× compared to state-of-the-art mechanisms that do not use network coding.  more » « less
Award ID(s):
1704742
NSF-PAR ID:
10285439
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
USENIX Symposium on Networked Systems Design and Implementation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Datacenters need networks that support both low-latency and high-bandwidth packet delivery to meet the stringent requirements of modern applications. We present Opera, a dynamic network that delivers latency-sensitive traffic quickly by relying on multi-hop forwarding in the same way as expander-graph-based approaches, but provides near-optimal bandwidth for bulk flows through direct forwarding over time-varying source-to-destination circuits. Unlike prior approaches, Opera requires no separate electrical network and no active circuit scheduling. The key to Opera's design is the rapid and deterministic reconfiguration of the network, piece-by-piece, such that at any moment in time the network implements an expander graph, yet, integrated across time, the network provides bandwidth-efficient single-hop paths between all racks. We show that Opera supports low-latency traffic with flow completion times comparable to cost-equivalent static topologies, while delivering up to 4x the bandwidth for all-to-all traffic and supporting up to 60% higher load for published datacenter workloads. 
    more » « less
  2. This paper proposes a machine-learning (ML)-aided cognitive approach for effective bandwidth reconfiguration in optically interconnected datacenter/high-performance computing (HPC) systems. The proposed approach relies on a Hyper-X-like architecture augmented with flexible-bandwidth photonic interconnections at large scales using a hierarchical intra/inter-POD photonic switching layout. We first formulate the problem of the connectivity graph and routing scheme optimization as a mixed-integer linear programming model. A two-phase heuristic algorithm and a joint optimization approach are devised to solve the problem with low time complexity. Then, we propose an ML-based end-to-end performance estimator design to assist the network control plane with intelligent decision making for bandwidth reconfiguration. Numerical simulations using traffic distribution profiles extracted from HPC applications traces as well as random traffic matrices verify the accuracy performance of the ML design estimator (<<#comment/>9%<#comment/>error) and demonstrate up to5×<#comment/>throughput gain from the proposed approach compared with the baseline Hyper-X network using fixed all-to-all intra/inter-portable data center interconnects.

     
    more » « less
  3. null (Ed.)
    Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k- choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44x, and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated. 
    more » « less
  4. Science and engineering applications are now generating data at an unprecedented rate. From large facilities such as the Large Hadron Collider to portable DNA sequencing devices, these instruments can produce hundreds of terabytes in short periods of time. Researchers and other professionals rely on networks to transfer data between sensing locations, instruments, data storage devices, and computing systems. While general-purpose networks, also referred to as enterprise networks, are capable of transporting basic data, such as e-mails and Web content, they face numerous challenges when transferring terabyte- and petabyte-scale data. At best, transfers of science data on these networks may last days or even weeks. In response to this challenge, the Science Demilitarized Zone (Science DMZ) has been proposed. The Science DMZ is a network or a portion of a network designed to facilitate the transfer of big science data. The main elements of the Science DMZ include: 1) specialized end devices, referred to as data transfer nodes (DTNs), built for sending/receiving data at a high speed over wide area networks; 2) high-throughput, friction-free paths connecting DTNs, instruments, storage devices, and computing systems; 3) performance measurement devices to monitor end-to-end paths over multiple domains; and 4) security policies and enforcement mechanisms tailored for high-performance environments. Despite the increasingly important role of Science DMZs, the literature is still missing a guideline to provide researchers and other professionals with the knowledge to broaden the understanding and development of Science DMZs. This paper addresses this gap by presenting a comprehensive tutorial on Science DMZs. The tutorial reviews fundamental network concepts that have a large impact on Science DMZs, such as router architecture, TCP attributes, and operational security. Then, the tutorial delves into protocols and devices at different layers, from the physical cyberinfrastructure to application-layer tools and security appliances, that must be carefully considered for the optimal operation of Science DMZs. This paper also contrasts Science DMZs with general-purpose networks, and presents empirical results and use cases applicable to current and future Science DMZs. 
    more » « less
  5. IEEE 802.15.4-based wireless sensor-actuator networks have been widely adopted by process industries in recent years because of their significant role in improving industrial efficiency and reducing operating costs. Today, industrial wireless sensor-actuator networks are becoming tremendously larger and more complex than before. However, a large, complex mesh network is hard to manage and inelastic to change once the network is deployed. In addition, flooding-based time synchronization and information dissemination introduce significant communication overhead to the network. More importantly, the deliveries of urgent and critical information such as emergency alarms suffer long delays, because those messages must go through the hop-by-hop transport. A promising solution to overcome those limitations is to enable the direct messaging from a long-range radio to an IEEE 802.15.4 radio. Then messages can be delivered to all field devices in a single-hop fashion. This article presents our study on enabling the cross-technology communication from LoRa to ZigBee using the energy emission of the LoRa radio as the carrier to deliver information. Experimental results show that our cross-technology communication approach provides reliable communication from LoRa to ZigBee with the throughput of up to 576.80 bps and the bit error rate of up to 5.23% in the 2.4 GHz band. 
    more » « less