skip to main content

Search for: All records

Award ID contains: 1633720

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Ragusa, Maria Alessandra (Ed.)
    We study scheduling mechanisms that explore the trade-off between containing the spread of COVID-19 and performing in-person activity in organizations. Our mechanisms, referred to as group scheduling , are based on partitioning the population randomly into groups and scheduling each group on appropriate days with possible gaps (when no one is working and all are quarantined). Each group interacts with no other group and, importantly, any person who is symptomatic in a group is quarantined. We show that our mechanisms effectively trade-off in-person activity for more effective control of the COVID-19 virus spread. In particular, we show that a mechanism which partitions the population into two groups that alternatively work in-person for five days each, flatlines the number of COVID-19 cases quite effectively, while still maintaining in-person activity at 70% of pre-COVID-19 level. Other mechanisms that partitions into two groups with less continuous work days or more spacing or three groups achieve even more aggressive control of the virus at the cost of a somewhat lower in-person activity (about 50%). We demonstrate the efficacy of our mechanisms by theoretical analysis and extensive experimental simulations on various epidemiological models based on real-world data.
    Free, publicly-accessible full text available September 15, 2023
  2. Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration . These applications show that our approach can yield lower boundsmore »for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k , improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: PageRank: We show a lower bound of Ὼ(n/k 2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ(n/k 2 ) rounds. Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ὼ(m/k 5/3 ) rounds. This result also implies the first non-trivial lower bound of Ὼ(n 1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ(m/k 5/3 + n/k 4/3 ) rounds.« less
  3. We study several fundamental problems in the k-machine model, a message-passing model for large-scale distributed computations where k ≥ 2 machines jointly perform computations on a large input of size N, (typically, N ≫ k). The input is initially partitioned (randomly or in a balanced fashion) among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main result is a general technique for designing efficient deterministic distributed algorithms in the k-machine model using PRAM algorithms. Our technique works by efficiently simulating PRAM algorithms in the k-machine model in a deterministic way. This simulation allows us to arrive at new algorithms in the k-machine model for some problems for which no efficient k-machine algorithms are known before and also improve on existing results in the k-machine model for some problems. While our simulation allows us to obtain k-machine algorithms for any problem with a known PRAM algorithm, we mainly focus on graph problems. For an input graph on n vertices and m edges, we obtain Õ(m/k 2 ) round 4 algorithms for various graph problems such as r-connectivity for r = 1,more »2, 3, 4, minimum spanning tree (MST), maximal independent set (MIS), (Δ + 1)-coloring, maximal matching, ear decomposition, and spanners under the assumption that the edges of the input graph are partitioned (randomly, or in an arbitrary, but balanced, fashion) among the k machines. For problems such as connectivity and MST, the above bound is (essentially) the best possible (up to logarithmic factors). Our simulation technique allows us to obtain the first known efficient deterministic algorithms in the k-machine model for other problems with known deterministic PRAM algorithms.« less
  4. Gilbert, Seth (Ed.)
    This paper concerns designing distributed algorithms that are singularly optimal, i.e., algorithms that are simultaneously time and message optimal, for the fundamental leader election problem in asynchronous networks. Kutten et al. (JACM 2015) presented a singularly near optimal randomized leader election algorithm for general synchronous networks that ran in O(D) time and used O(m log n) messages (where D, m, and n are the network’s diameter, number of edges and number of nodes, respectively) with high probability. Both bounds are near optimal (up to a logarithmic factor), since Ω(D) and Ω(m) are the respective lower bounds for time and messages for leader election even for synchronous networks and even for (Monte-Carlo) randomized algorithms. On the other hand, for general asynchronous networks, leader election algorithms are only known that are either time or message optimal, but not both. Kutten et al. (DISC 2020) presented a randomized asynchronous leader election algorithm that is singularly near optimal for complete networks, but left open the problem for general networks. This paper shows that singularly near optimal (up to polylogarithmic factors) bounds can be achieved for general asynchronous networks. We present a randomized singularly near optimal leader election algorithm that runs in O(D + log²more »n) time and O(m log² n) messages with high probability. Our result is the first known distributed leader election algorithm for asynchronous networks that is near optimal with respect to both time and message complexity and improves over a long line of results including the classical results of Gallager et al. (ACM TOPLAS, 1983), Peleg (JPDC, 1989), and Awerbuch (STOC, 89).« less
  5. The K-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point q, we want to assign a label to q based on the labels of the K-nearest points to the query. We study this problem in the k-machine model, a model for distributed large-scale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to compute an answer given a query point to a machine using a small number of communication rounds. Our main result is a randomized algorithm in the k-machine model that runs in O(log K) communication rounds with high success probability (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(k log K) messages. Our bounds are essentially the best possible for comparison-based algorithms. We also implemented our algorithm and show that it performs well in practice.
  6. We study smoothed analysis of distributed graph algorithms, focusing on the fundamental minimum spanning tree (MST) problem. With the goal of studying the time complexity of distributed MST as a function of the "perturbation" of the input graph, we posit a smoothing model that is parameterized by a smoothing parameter 0 ≤ ϵ(n) ≤ 1 which controls the amount of random edges that can be added to an input graph G per round. Informally, ϵ(n) is the probability (typically a small function of n, e.g., n--¼) that a random edge can be added to a node per round. The added random edges, once they are added, can be used (only) for communication. We show upper and lower bounds on the time complexity of distributed MST in the above smoothing model. We present a distributed algorithm that, with high probability, 1 computes an MST and runs in Õ(min{1/√ϵ(n)2O(√log n), D+ √n}) rounds2 where ϵ is the smoothing parameter, D is the network diameter and n is the network size. To complement our upper bound, we also show a lower bound of Ω(min{1/√ϵ(n), D + √n}). We note that the upper and lower bounds essentially match except for a multiplicative 2O(√log n)more »polylog(n) factor. Our work can be considered as a first step in understanding the smoothed complexity of distributed graph algorithms.« less
  7. Triangle enumeration is a fundamental problem in large-scale graph analysis. For instance, triangles are used to solve practical problems like community detection and spam filtering. On the other hand, there is a large amount of data stored on database management systems (DBMSs), which can be modeled and analyzed as graphs. Alternatively, graph data can be quickly loaded into a DBMS. Our paper shows how to adapt and optimize a randomized distributed triangle enumeration algorithm with SQL queries, which is a significantly different approach from programming graph algorithms in traditional languages such as Python or C++. We choose a parallel columnar DBMS given its fast query processing, but our solution should work for a row DBMS as well. Our randomized solution provides a balanced workload for parallel query processing, being robust to the existence of skewed degree vertices. We experimentally prove our solution ensures a balanced data distribution, and hence workload, among machines. The key idea behind the algorithm is to evenly partition all possible triplets of vertices among machines, sending edges that may form a triangle to a proxy machine; this edge redistribution eliminates shuffling edges during join computation and therefore triangle enumeration becomes local and fully parallel. In summary,more »our algorithm exhibits linear speedup with large graphs, including graphs that have high skewness in vertex degree distributions.« less