We study smoothed analysis of distributed graph algorithms, focusing on the fundamental minimum spanning tree (MST) problem. With the goal of studying the time complexity of distributed MST as a function of the "perturbation" of the input graph, we posit a smoothing model that is parameterized by a smoothing parameter 0 ≤ ϵ(n) ≤ 1 which controls the amount of random edges that can be added to an input graph G per round. Informally, ϵ(n) is the probability (typically a small function of n, e.g., n¼) that a random edge can be added to a node per round. The added random edges, once they are added, can be used (only) for communication.
We show upper and lower bounds on the time complexity of distributed MST in the above smoothing model. We present a distributed algorithm that, with high probability, 1 computes an MST and runs in Õ(min{1/√ϵ(n)2O(√log n), D+ √n}) rounds2 where ϵ is the smoothing parameter, D is the network diameter and n is the network size.
To complement our upper bound, we also show a lower bound of Ω(min{1/√ϵ(n), D + √n}). We note that the upper and lower bounds essentially match except for a multiplicative 2O(√log n) polylog(n) factor.
Our work can be considered as a first step in understanding the smoothed complexity of distributed graph algorithms.
more »
« less
Universallyoptimal distributed algorithms for known topologies
Many distributed optimization algorithms achieve existentiallyoptimal running times, meaning that there exists some pathological worstcase topology on which no algorithm can do better. Still, most networks of interest allow for exponentially faster algorithms. This motivates two questions:
(i) What network topology parameters determine the complexity of distributed optimization?
(ii) Are there universallyoptimal algorithms that are as fast as possible on every topology?
We resolve these 25yearold open problems in the knowntopology setting (i.e., supported CONGEST) for a wide class of global network optimization problems including MST, (1+є)min cut, various approximate shortest paths problems, subgraph connectivity, etc.
In particular, we provide several (equivalent) graph parameters and show they are tight universal lower bounds for the above problems, fully characterizing their inherent complexity. Our results also imply that algorithms based on the lowcongestion shortcut framework match the above lower bound, making them universally optimal if shortcuts are efficiently approximable.
more »
« less
 NSFPAR ID:
 10271616
 Date Published:
 Journal Name:
 Symposium on Theory of Computing (STOC)
 Page Range / eLocation ID:
 1166 to 1179
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)Motivated by the increasing need to understand the distributed algorithmic foundations of largescale graph computations, we study some fundamental graph problems in a messagepassing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many realworld systems. Communication is pointtopoint, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used to show nontrivial lower bounds on the round complexity of distributed largescale data computations. This result is established via an informationtheoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including nongraph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration . These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k , improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: PageRank: We show a lower bound of Ὼ(n/k 2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ(n/k 2 ) rounds. Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ὼ(m/k 5/3 ) rounds. This result also implies the first nontrivial lower bound of Ὼ(n 1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ(m/k 5/3 + n/k 4/3 ) rounds.more » « less

Network connectivity optimization, which aims to manipulate network connectivity by changing its underlying topology, is a fundamental task behind a wealth of highimpact data mining applications, ranging from immunization, critical infrastructure construction, social collaboration mining, bioinformatics analysis, to intelligent transportation system design. To tackle its exponential computation complexity, greedy algorithms have been extensively used for network connectivity optimization by exploiting its diminishing returns property. Despite the empirical success, two key challenges largely remain open. First, on the theoretic side, the hardness, as well as the approximability of the general network connectivity optimization problem are still nascent except for a few special instances. Second, on the algorithmic side, current algorithms are often hard to balance between the optimization quality and the computational efficiency. In this paper, we systematically address these two challenges for the network connectivity optimization problem. First, we reveal some fundamental limits by proving that, for a wide range of network connectivity optimization problems, (1) they are NPhard and (2) (11/e) is the optimal approximation ratio for any polynomial algorithms. Second, we propose an effective, scalable and general algorithm (CONTAIN) to carefully balance the optimization quality and the computational efficiency.more » « less

We suggest a general oraclebased framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds for several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the “natural” algorithms are not known to be optimal.more » « less

The growing success of graph signal processing (GSP) approaches relies heavily on prior identification of a graph over which network data admit certain regularity. However, adaptation to increasingly dynamic environments as well as demands for realtime processing of streaming data pose major challenges to this end. In this context, we develop novel algorithms for online network topology inference given streaming observations assumed to be smooth on the sought graph. Unlike existing batch algorithms, our goal is to track the (possibly) timevarying network topology while maintaining the memory and computational costs in check by processing graph signals sequentiallyintime. To recover the graph in an online fashion, we leverage proximal gradient (PG) methods to solve a judicious smoothnessregularized, timevarying optimization problem. Under mild technical conditions, we establish that the online graph learning algorithm converges to within a neighborhood of (i.e., it tracks) the optimal timevarying batch solution. Computer simulations using both synthetic and real financial market data illustrate the effectiveness of the proposed algorithm in adapting to streaming signals to track slowlyvarying network connectivity.more » « less