skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams
The maximum coverage problem is to select k sets, from a collection of m sets, such that the cardinality of their union, in a universe of size n, is maximized. We consider (1-1/e-ε)-approximation algorithms for this NP-hard problem in three standard data stream models. 1) Dynamic Model. The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses ε^{-2} k ⋅ polylog(n,m) space. The best previous result (Assadi and Khanna, SODA 2018) used (n +ε^{-4} k) polylog(n,m) space. While both algorithms use O(ε^{-1} log m) passes, our analysis shows that, when ε ≤ 1/log log m, it is possible to reduce the number of passes by a 1/log log m factor without incurring additional space. 2) Random Order Model. In this model, there are no deletions, and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and k polylog(n,m) space suffices for arbitrary small constant ε. The best previous result, by Warneke et al. (ESA 2023), used k² polylog(n,m) space. 3) Insert-Only Model. Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: polylog(m,n) update time suffices, whereas the best previous result by Jaud et al. (SEA 2023) required update time that was linear in k.  more » « less
Award ID(s):
2006589 1934846
PAR ID:
10557344
Author(s) / Creator(s):
; ;
Editor(s):
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Volume:
308
ISSN:
1868-8969
ISBN:
978-3-95977-338-6
Page Range / eLocation ID:
308-308
Subject(s) / Keyword(s):
Data Stream Computation Maximum Coverage Submodular Maximization Theory of computation → Sketching and sampling
Format(s):
Medium: X Size: 15 pages; 858288 bytes Other: application/pdf
Size(s):
15 pages 858288 bytes
Right(s):
Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a k-uniform hypergraph H with n vertices and m hyperedges, each hyperedge being a k-sized subset of vertices. A k-simplex in H is a subhypergraph on k+1 vertices X such that all k+1 possible hyperedges among X exist in H. The goal is to process the hyperedges of H, which arrive in an arbitrary order as a data stream, and compute a good estimate of T_k(H), the number of k-simplices in H. We design a suite of algorithms for this problem. As with triangle-counting in graphs (which is the special case k = 2), sublinear space is achievable but only under a promise of the form T_k(H) ≥ T. Under such a promise, our algorithms use at most four passes and together imply a space bound of O(ε^{-2} log δ^{-1} polylog n ⋅ min{(m^{1+1/k})/T, m/(T^{2/(k+1)})}) for each fixed k ≥ 3, in order to guarantee an estimate within (1±ε)T_k(H) with probability ≥ 1-δ. We also give a simpler 1-pass algorithm that achieves O(ε^{-2} log δ^{-1} log n⋅ (m/T) (Δ_E + Δ_V^{1-1/k})) space, where Δ_E (respectively, Δ_V) denotes the maximum number of k-simplices that share a hyperedge (respectively, a vertex), which generalizes a previous result for the k = 2 case. We complement these algorithmic results with space lower bounds of the form Ω(ε^{-2}), Ω(m^{1+1/k}/T), Ω(m/T^{1-1/k}) and Ω(mΔ_V^{1/k}/T) for multi-pass algorithms and Ω(mΔ_E/T) for 1-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs. 
    more » « less
  2. Barman, Siddharth ; Lasota, Sławomir (Ed.)
    We consider the problem of estimating the size of a maximum matching in low-arboricity graphs in the dynamic graph stream model. In this setting, an algorithm with limited memory makes multiple passes over a stream of edge insertions and deletions, resulting in a low-arboricity graph. Let n be the number of vertices of the input graph, and α be its arboricity. We give the following results. 1) As our main result, we give a three-pass streaming algorithm that produces an (α + 2)(1 + ε)-approximation and uses space O(ε^{-2}⋅α²⋅n^{1/2}⋅log n). This result should be contrasted with the Ω(α^{-5/2}⋅n^{1/2}) space lower bound established by [Assadi et al., SODA'17] for one-pass algorithms, showing that, for graphs of constant arboricity, the one-pass space lower bound can be achieved in three passes (up to poly-logarithmic factors). Furthermore, we obtain a two-pass algorithm that uses space O(ε^{-2}⋅α²⋅n^{3/5}⋅log n). 2) We also give a (1+ε)-approximation multi-pass algorithm, where the space used is parameterized by an upper bound on the size of a largest matching. For example, using O(log log n) passes, the space required is O(ε^{-1}⋅α²⋅k⋅log n), where k denotes an upper bound on the size of a largest matching. Finally, we define a notion of arboricity in the context of matrices. This is a natural measure of the sparsity of a matrix that is more nuanced than simply bounding the total number of nonzero entries, but less restrictive than bounding the number of nonzero entries in each row and column. For such matrices, we exploit our results on estimating matching size to present upper bounds for the problem of rank estimation in the dynamic data stream model. 
    more » « less
  3. We consider the maximum matching problem in the semi-streaming model formalized by Feigenbaum, Kannan, McGregor, Suri, and Zhang that is inspired by giant graphs of today. As our main result, we give a two-pass (1/2 + 1/16)-approximation algorithm for triangle-free graphs and a two-pass (1/2 + 1/32)-approximation algorithm for general graphs; these improve the approximation ratios of 1/2 + 1/52 for bipartite graphs and 1/2 + 1/140 for general graphs by Konrad, Magniez, and Mathieu. In three passes, we are able to achieve approximation ratios of 1/2 + 1/10 for triangle-free graphs and 1/2 + 1/19.753 for general graphs. We also give a multi-pass algorithm where we bound the number of passes precisely—we give a (2/3 − ε)- approximation algorithm that uses 2/(3ε) passes for triangle-free graphs and 4/(3ε) passes for general graphs. Our algorithms are simple and combinatorial, use O(n log n) space, and (can be implemented to) have O(1) update time per edge. For general graphs, our multi-pass algorithm improves the best known deterministic algorithms in terms of the number of passes: * Ahn and Guha give a (2/3−ε)-approximation algorithm that uses O(log(1/ε)/ε2) passes, whereas our (2/3 − ε)-approximation algorithm uses 4/(3ε) passes; * they also give a (1 − ε)-approximation algorithm that uses O(log n · poly(1/ε)) passes, where n is the number of vertices of the input graph; although our algorithm is (2/3−ε)-approximation, our number of passes do not depend on n. Earlier multi-pass algorithms either have a large constant inside big-O notation for the number of passes or the constant cannot be determined due to the involved analysis, so our multi-pass algorithm should use much fewer passes for approximation ratios bounded slightly below 2/3. 
    more » « less
  4. Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21). 
    more » « less
  5. We present a set of parallel algorithms for computing exact k-nearest neighbors in low dimensions. Many k-nearest neighbor algorithms use either a kd-tree or the Morton ordering of the point set; our algorithms combine these approaches using a data structure we call the zd-tree. We show that this combination is both theoretically efficient under common assumptions, and fast in practice. For point sets of size n with bounded expansion constant and bounded ratio, the zd-tree can be built in O(n) work with O(n^ε) span for constant ε < 1, and searching for the k-nearest neighbors of a point takes expected O(k log k) time. We benchmark our k-nearest neighbor algorithms against existing parallel k-nearest neighbor algorithms, showing that our implementations are generally faster than the state of the art as well as achieving 75x speedup on 144 hyperthreads. Furthermore, the zd-tree supports parallel batch-dynamic insertions and deletions; to our knowledge, it is the first k-nearest neighbor data structure to support such updates. On point sets with bounded expansion constant and bounded ratio, a batch-dynamic update of size k requires O(k log n/k) work with O(k^ε + polylog(n)) span. 
    more » « less