In a landmark 1981 paper, Valiant and Brebner gave birth to the study of oblivious routing and, simultaneously, introduced its most powerful and ubiquitous method: Valiant load balancing (VLB). By routing messages through a randomly sampled intermediate node, VLB lengthens routing paths by a factor of two but gains the crucial property of obliviousness: it balances load in a completely decentralized manner, with no global knowledge of the communication pattern. Forty years later, with datacenters handling workloads whose communication pattern varies too rapidly to allow centralized coordination, oblivious routing is as relevant as ever, and VLB continues to take center stage as a widely used — and in some settings, provably optimal — way to balance load in the network obliviously to the traffic demands. However, the ability of the network to rapidly reconfigure its interconnection topology gives rise to new possibilities.
In this work we revisit the question of whether VLB remains optimal in the novel setting of reconfigurable networks. Prior work showed that VLB achieves the optimal tradeoff between latency and guaranteed throughput. In this work we show that a strictly superior latency-throughput tradeoff is achievable when the throughput bound is relaxed to hold with high probability. The same improved tradeoff is also achievable with guaranteed throughput under time-stationary demands, provided the latency bound is relaxed to hold with high probability and that the network is allowed to be semi-oblivious, using an oblivious (randomized) connection schedule but demand-aware routing. We prove that the latter result is not achievable by any fully-oblivious reconfigurable network design, marking a rare case in which semi-oblivious routing has a provable asymptotic advantage over oblivious routing. Our results are enabled by a novel oblivious routing scheme that improves VLB by stretching routing paths the minimum possible amount — an additive stretch of 1 rather than a multiplicative stretch of 2 — yet still manages to balance load with high probability when either the traffic demand matrix or the network’s interconnection schedule are shuffled by a uniformly random permutation. To analyze our routing scheme we prove an exponential tail bound which may be of independent interest, concerning the distribution of values of a bilinear form on an orbit of a permutation group action.
more »
« less
Optimal oblivious reconfigurable networks
Oblivious routing has a long history in both the theory and practice of networking. In this work we initiate the formal study of oblivious routing in the context of reconfigurable networks, a new architecture that has recently come to the fore in datacenter networking. These networks allow a rapidly changing bounded-degree pattern of interconnections between nodes, but the network topology and the selection of routing paths must both be oblivious to the traffic demand matrix. Our focus is on the trade-off between maximizing throughput and minimizing latency in these networks. For every constant throughput rate, we characterize (up to a constant factor) the minimum latency achievable by an oblivious reconfigurable network design that satisfies the given throughput guarantee. The trade-off between these two objectives turns out to be surprisingly subtle: the curve depicting it has an unexpected scalloped shape reflecting the fact that load-balancing becomes more difficult when the average length of routing paths is not an integer because equalizing all the path lengths is not possible. The proof of our lower bound uses LP duality to verify that Valiant load balancing is the most efficient oblivious routing scheme when used in combination with an optimally-designed reconfigurable network topology. The proof of our upper bound uses an algebraic construction in which the network nodes are identified with vectors over a finite field, the network topology is described by either the elementary basis or a sequence of Vandermonde matrices, and routing paths are constructed by selecting columns of these matrices to yield the appropriate mixture of path lengths within the shortest possible time interval.
more »
« less
- Award ID(s):
- 1704742
- PAR ID:
- 10419465
- Date Published:
- Journal Name:
- ACM SIGACT Symposium on Theory of Computing
- Page Range / eLocation ID:
- 1339 to 1352
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The Dragonfly network has been deployed in the current generation supercomputers and will be used in the next generation supercomputers. The Universal Globally Adaptive Load-balance routing (UGAL) is the state-of-the-art routing scheme for Dragonfly. In this work, we show that the performance of the conventional UGAL can be further improved on many practical Dragonfly networks, especially the ones with a small number of groups, by customizing the paths used in UGAL for each topology. We develop a scheme to compute the custom sets of paths for each topology and compare the performance of our topology-custom UGAL routing (T-UGAL) with conventional UGAL. Our evaluation with different UGAL variations and different topologies demonstrates that by customizing the routes, T-UGAL offers significant improvements over UGAL on many practical Dragonfly networks in terms of both latency when the network is under low load and throughput when the network is under high load.more » « less
-
Oblivious routing of network traffic uses predetermined paths that do not change with changing traffic patterns. It has the benefit of using a fixed network configuration while robustly handling a range of varying and unpredictable traffic. Theoretical advances have shown that the benefits of oblivious routing are achievable without compromising much capacity efficiency. For oblivious routing, we only assume knowledge of the ingress/egress capacities of the edge nodes through which traffic enters or leaves the network. All traffic patterns possible subject to the ingress/egress capacity constraints (also known as the hose constraints) are permissible and are to be handled using oblivious routing. We use the widely deployed segment routing method for route control. Furthermore, for ease of deployment and to not deviate too much from conventional shortest path routing, we restrict paths to be 2-segment paths (the composition of two shortest path routed segments). We solve the 2-segment oblivious routing problem for all permissible traffic matrices (which can be infinitely-many).We develop a new adversarial and machine-learning driven approach that uses an iterative gradient descent method to solve the routing problem with worst-case performance guarantees. Additionally, the parallelism involved in descent methods allows this method to scale well with the network size making it amenable for use in practice.more » « less
-
Battery-powered computing solutions have grown in importance and utility across a wide range of applications in the technology industry, including both consumer and industrial uses. Devices that are not attached to a stable and constant power source must ensure that all power consumption is minimized while necessary computation and communications are performed. WiFi networking is ubiquitous in modern devices, and thus the power consumption necessary to transmit data is of utmost concern for these battery powered devices. The Ad hoc OnDemand Distance Vector (AODV) routing algorithm is a widely adopted and adapted routing system for path finding in wireless networks. AODV’s original implementation did not include power consumption as a consideration for route determinations. The Energy Aware AODV (EA-AODV) algorithm was an attempt to account for energy conservation by varying broadcast power and choosing paths with distance between nodes as a consideration in routing. Lightning Strike AODV (LS-AODV) described in this paper is a proposed routing algorithm that further accounts for energy consumption in wireless networking by balancing energy in a network. Quality of service is maintained while energy levels are increased through networks using the LS-AODV algorithm.more » « less
-
null (Ed.)We prove the existence of an oblivious routing scheme that is poly(logn)-competitive in terms of (congestion + dilation), thus resolving a well-known question in oblivious routing. Concretely, consider an undirected network and a set of packets each with its own source and destination. The objective is to choose a path for each packet, from its source to its destination, so as to minimize (congestion + dilation), defined as follows: The dilation is the maximum path hop-length, and the congestion is the maximum number of paths that include any single edge. The routing scheme obliviously and randomly selects a path for each packet independent of (the existence of) the other packets. Despite this obliviousness, the selected paths have (congestion + dilation) within a poly(logn) factor of the best possible value. More precisely, for any integer hop-bound h, this oblivious routing scheme selects paths of length at most h · poly(logn) and is poly(logn)-competitive in terms of congestion in comparison to the best possible congestion achievable via paths of length at most h hops. These paths can be sampled in polynomial time. This result can be viewed as an analogue of the celebrated oblivious routing results of R'acke [FOCS 2002, STOC 2008], which are O(logn)-competitive in terms of congestion, but are not competitive in terms of dilation.more » « less