skip to main content


Title: Improved Local Computation Algorithm for Set Cover via Sparsification
We design a Local Computation Algorithm (LCA) for the set cover problem. Given a set system where each set has size at most s and each element is contained in at most t sets, the algorithm reports whether a given set is in some fixed set cover whose expected size is O(log s) times the minimum fractional set cover value. Our algorithm requires sO(log s) tO(log s+log log t)) queries. This result improves upon the application of the reduction of [Parnas and Ron, TCS’07] on the result of [Kuhn et al., SODA’06], which leads to a query complexity of (st) O(log s · log t). To obtain this result, we design a parallel set cover algorithm that admits an efficient simulation in the LCA model by using a sparsification technique introduced in [Ghaffari and Uitto, SODA’19] for the maximal independent set problem. The parallel algorithm adds a random subset of the sets to the solution in a style similar to the PRAM algorithm of [Berger et al., FOCS’89]. However, our algorithm differs in the way that it never revokes its decisions, which results in a fewer number of adaptive rounds. This requires a novel approximation analysis which might be of independent interest.  more » « less
Award ID(s):
1741137
NSF-PAR ID:
10220505
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM-SIAM Symposium on Discrete Algorithms (SODA 2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study local symmetry breaking problems in the Congest model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. The time (round) complexity of MIS (and ruling sets) have attracted much attention in the Local model. Indeed, recent results (Barenboim et al., FOCS 2012, Ghaffari SODA 2016) for the MIS problem have tried to break the long-standing O(log n)-round "barrier" achieved by Luby's algorithm, but these yield o(log n)-round complexity only when the maximum degree Delta is somewhat small relative to n. More importantly, these results apply only in the Local model. In fact, the best known time bound in the Congest model is still O(log n) (via Luby's algorithm) even for moderately small Delta (i.e., for Delta = Omega(log n) and Delta = o(n)). Furthermore, message complexity has been largely ignored in the context of local symmetry breaking. Luby's algorithm takes O(m) messages on m-edge graphs and this is the best known bound with respect to messages. Our work is motivated by the following central question: can we break the Theta(log n) time complexity barrier and the Theta(m) message complexity barrier in the Congest model for MIS or closely-related symmetry breaking problems? This paper presents progress towards this question for the distributed ruling set problem in the Congest model. A beta-ruling set is an independent set such that every node in the graph is at most beta hops from a node in the independent set. We present the following results: - Time Complexity: We show that we can break the O(log n) "barrier" for 2- and 3-ruling sets. We compute 3-ruling sets in O(log n/log log n) rounds with high probability (whp). More generally we show that 2-ruling sets can be computed in O(log Delta (log n)^(1/2 + epsilon) + log n/log log n) rounds for any epsilon > 0, which is o(log n) for a wide range of Delta values (e.g., Delta = 2^(log n)^(1/2-epsilon)). These are the first 2- and 3-ruling set algorithms to improve over the O(log n)-round complexity of Luby's algorithm in the Congest model. - Message Complexity: We show an Omega(n^2) lower bound on the message complexity of computing an MIS (i.e., 1-ruling set) which holds also for randomized algorithms and present a contrast to this by showing a randomized algorithm for 2-ruling sets that, whp, uses only O(n log^2 n) messages and runs in O(Delta log n) rounds. This is the first message-efficient algorithm known for ruling sets, which has message complexity nearly linear in n (which is optimal up to a polylogarithmic factor). 
    more » « less
  2. We study local symmetry breaking problems in the Congest model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. The time (round) complexity of MIS (and ruling sets) have attracted much attention in the Local model. Indeed, recent results (Barenboim et al., FOCS 2012, Ghaffari SODA 2016) for the MIS problem have tried to break the long-standing O(log n)-round “barrier” achieved by Luby’s algorithm, but these yield o(log n)-round complexity only when the maximum degree  is somewhat small relative to n. More importantly, these results apply only in the Local model. In fact, the best known time bound in the Congest model is still O(log n) (via Luby’s algorithm) even for moderately small  (i.e., for  = (log n) and  = o(n)). Furthermore, message complexity has been largely ignored in the context of local symmetry breaking. Luby’s algorithm takes O(m) messages on m-edge graphs and this is the best known bound with respect to messages. Our work is motivated by the following central question: can we break the (log n) time complexity barrier and the (m) message complexity barrier in the Congest model for MIS or closelyrelated symmetry breaking problems? This paper presents progress towards this question for the distributed ruling set problem in the Congest model. A -ruling set is an independent set such that every node in the graph is at most hops from a node in the independent set. We present the following results: Time Complexity: We show that we can break the O(log n) “barrier” for 2- and 3-ruling sets. We compute 3-ruling sets in O  log n log log n  rounds with high probability (whp). More generally we show that 2-ruling sets can be computed in O  log · (log n)1/2+" + log n log log n  rounds for any " > 0, which is o(log n) for a wide range of  values (e.g.,  = 2(log n)1/2−" ). These are the first 2- and 3-ruling set algorithms to improve over the O(log n)-round complexity of Luby’s algorithm in the Congest model. Message Complexity: We show an (n2) lower bound on the message complexity of computing an MIS (i.e., 1-ruling set) which holds also for randomized algorithms and present a contrast to this by showing a randomized algorithm for 2-ruling sets that, whp, uses only O(n log2 n) messages and runs in O( log n) rounds. This is the first message-efficient algorithm known for ruling sets, which has message complexity nearly linear in n (which is optimal up to a polylogarithmic factor). 
    more » « less
  3. Mikolaj Bojanczyk ; Emanuela Merelli ; David P. Woodruff (Ed.)
    Two equal length strings are a parameterized match (p-match) iff there exists a one-to-one function that renames the symbols in one string to those in the other. The Parameterized Suffix Tree (PST) [Baker, STOC' 93] is a fundamental data structure that handles various string matching problems under this setting. The PST of a text T[1,n] over an alphabet Σ of size σ takes O(nlog n) bits of space. It can report any entry in (parameterized) (i) suffix array, (ii) inverse suffix array, and (iii) longest common prefix (LCP) array in O(1) time. Given any pattern P as a query, a position i in T is an occurrence iff T[i,i+|P|-1] and P are a p-match. The PST can count the number of occurrences of P in T in time O(|P|log σ) and then report each occurrence in time proportional to that of accessing a suffix array entry. An important question is, can we obtain a compressed version of PST that takes space close to the text’s size of nlogσ bits and still support all three functionalities mentioned earlier? In SODA' 17, Ganguly et al. answered this question partially by presenting an O(nlogσ) bit index that can support (parameterized) suffix array and inverse suffix array operations in O(log n) time. However, the compression of the (parameterized) LCP array and the possibility of faster suffix array and inverse suffix array queries in compact space were left open. In this work, we obtain a compact representation of the (parameterized) LCP array. With this result, in conjunction with three new (parameterized) suffix array representations, we obtain the first set of PST representations in o(nlog n) bits (when logσ = o(log n)) as follows. Here ε > 0 is an arbitrarily small constant. - Space O(n logσ) bits and query time O(log_σ^ε n); - Space O(n logσlog log_σ n) bits and query time O(log log_σ n); and - Space O(n logσ log^ε_σ n) bits and query time O(1). The first trade-off is an improvement over Ganguly et al.’s result, whereas our third trade-off matches the optimal time performance of Baker’s PST while squeezing the space by a factor roughly log_σ n. We highlight that our trade-offs match the space-and-time bounds of the best-known compressed text indexes for exact pattern matching and further improvement is highly unlikely. 
    more » « less
  4. Abstract

    We consider the problem of covering multiple submodular constraints. Given a finite ground setN, a weight function$$w: N \rightarrow \mathbb {R}_+$$w:NR+,rmonotone submodular functions$$f_1,f_2,\ldots ,f_r$$f1,f2,,froverNand requirements$$k_1,k_2,\ldots ,k_r$$k1,k2,,krthe goal is to find a minimum weight subset$$S \subseteq N$$SNsuch that$$f_i(S) \ge k_i$$fi(S)kifor$$1 \le i \le r$$1ir. We refer to this problem asMulti-Submod-Coverand it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$r=1Multi-Submod-Covergeneralizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced toSubmod-SC. A simple greedy algorithm gives an$$O(\log (kr))$$O(log(kr))approximation where$$k = \sum _i k_i$$k=ikiand this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm forMulti-Submod-Coverthat covers each constraint to within a factor of$$(1-1/e-\varepsilon )$$(1-1/e-ε)while incurring an approximation of$$O(\frac{1}{\epsilon }\log r)$$O(1ϵlogr)in the cost. Second, we consider the special case when each$$f_i$$fiis a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraints Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.

     
    more » « less
  5. Abstract

    Sequence mappability is an important task in genome resequencing. In the (km)-mappability problem, for a given sequenceTof lengthn, the goal is to compute a table whoseith entry is the number of indices$$j \ne i$$jisuch that the length-msubstrings ofTstarting at positionsiandjhave at mostkmismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of$$k=1$$k=1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for$$k=O(1)$$k=O(1), works in$$O(n)$$O(n)space and, with high probability, in$$O(n \cdot \min \{m^k,\log ^k n\})$$O(n·min{mk,logkn})time. Our algorithm requires a careful adaptation of thek-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop$$O(n^2)$$O(n2)-time algorithms to computeall(km)-mappability tables for a fixedmand all$$k\in \{0,\ldots ,m\}$$k{0,,m}or a fixedkand all$$m\in \{k,\ldots ,n\}$$m{k,,n}. Finally, we show that, for$$k,m = \Theta (\log n)$$k,m=Θ(logn), the (km)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018.

     
    more » « less