skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improved Streaming Algorithm for the Klee’s Measure Problem and Generalizations
Estimating the size of the union of a stream of sets S₁, S₂, …, S_M where each set is a subset of a known universe Ω is a fundamental problem in data streaming. This problem naturally generalizes the well-studied 𝖥₀ estimation problem in the streaming literature, where each set contains a single element from the universe. We consider the general case when the sets S_i can be succinctly represented and allow efficient membership, cardinality, and sampling queries (called a Delphic family of sets). A notable example in this framework is the Klee’s Measure Problem (KMP), where every set S_i is an axis-parallel rectangle in d-dimensional spaces (Ω = [Δ]^d where [Δ] := {1, … ,Δ} and Δ ∈ ℕ). Recently, Meel, Chakraborty, and Vinodchandran (PODS-21, PODS-22) designed a streaming algorithm for (ε,δ)-estimation of the size of the union of set streams over Delphic family with space and update time complexity O((log³|Ω|)/ε² ⋅ log 1/δ) and Õ((log⁴|Ω|)/ε² ⋅ log 1/(δ)), respectively. This work presents a new, sampling-based algorithm for estimating the size of the union of Delphic sets that has space and update time complexity Õ((log²|Ω|)/ε² ⋅ log 1/(δ)). This improves the space complexity bound by a log|Ω| factor and update time complexity bound by a log² |Ω| factor. A critical question is whether quadratic dependence of log|Ω| on space and update time complexities is necessary. Specifically, can we design a streaming algorithm for estimating the size of the union of sets over Delphic family with space and complexity linear in log|Ω| and update time poly(log|Ω|)? While this appears technically challenging, we show that establishing a lower bound of ω(log|Ω|) with poly(log|Ω|) update time is beyond the reach of current techniques. Specifically, we show that under certain hard-to-prove computational complexity hypothesis, there is a streaming algorithm for the problem with optimal space complexity O(log|Ω|) and update time poly(log(|Ω|)). Thus, establishing a space lower bound of ω(log|Ω|) will lead to break-through complexity class separation results.  more » « less
Award ID(s):
2130608
PAR ID:
10553075
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Kumar, Amit; Ron-Zewi, Noga
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Volume:
317
ISSN:
1868-8969
ISBN:
978-3-95977-348-5
Page Range / eLocation ID:
317-317
Subject(s) / Keyword(s):
Sampling Streaming Klee’s Measure Problem Theory of computation → Sketching and sampling
Format(s):
Medium: X Size: 21 pages; 930445 bytes Other: application/pdf
Size(s):
21 pages 930445 bytes
Right(s):
Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
Sponsoring Org:
National Science Foundation
More Like this
  1. Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21). 
    more » « less
  2. Meka, Raghu (Ed.)
    We consider the problem of finding a minimum cut of a weighted graph presented as a single-pass stream. While graph sparsification in streams has been intensively studied, the specific application of finding minimum cuts in streams is less well-studied. To this end, we show upper and lower bounds on minimum cut problems in insertion-only streams for a variety of settings, including for both randomized and deterministic algorithms, for both arbitrary and random order streams, and for both approximate and exact algorithms. One of our main results is an Õ(n/ε) space algorithm with fast update time for approximating a spectral cut query with high probability on a stream given in an arbitrary order. Our result breaks the Ω(n/ε²) space lower bound required of a sparsifier that approximates all cuts simultaneously. Using this result, we provide streaming algorithms with near optimal space of Õ(n/ε) for minimum cut and approximate all-pairs effective resistances, with matching space lower-bounds. The amortized update time of our algorithms is Õ(1), provided that the number of edges in the input graph is at least (n/ε²)^{1+o(1)}. We also give a generic way of incorporating sketching into a recursive contraction algorithm to improve the post-processing time of our algorithms. In addition to these results, we give a random-order streaming algorithm that computes the exact minimum cut on a simple, unweighted graph using Õ(n) space. Finally, we give an Ω(n/ε²) space lower bound for deterministic minimum cut algorithms which matches the best-known upper bound up to polylogarithmic factors. 
    more » « less
  3. Guruswami, Venkatesan (Ed.)
    Inspired by the classic problem of Boolean function monotonicity testing, we investigate the testability of other well-studied properties of combinatorial finite set systems, specifically intersecting families and union-closed families. A function f: {0,1}ⁿ → {0,1} is intersecting (respectively, union-closed) if its set of satisfying assignments corresponds to an intersecting family (respectively, a union-closed family) of subsets of [n]. Our main results are that - in sharp contrast with the property of being a monotone set system - the property of being an intersecting set system, and the property of being a union-closed set system, both turn out to be information-theoretically difficult to test. We show that: - For ε ≥ Ω(1/√n), any non-adaptive two-sided ε-tester for intersectingness must make 2^{Ω(n^{1/4}/√{ε})} queries. We also give a 2^{Ω(√{n log(1/ε)})}-query lower bound for non-adaptive one-sided ε-testers for intersectingness. - For ε ≥ 1/2^{Ω(n^{0.49})}, any non-adaptive two-sided ε-tester for union-closedness must make n^{Ω(log(1/ε))} queries. Thus, neither intersectingness nor union-closedness shares the poly(n,1/ε)-query non-adaptive testability that is enjoyed by monotonicity. To complement our lower bounds, we also give a simple poly(n^{√{nlog(1/ε)}},1/ε)-query, one-sided, non-adaptive algorithm for ε-testing each of these properties (intersectingness and union-closedness). We thus achieve nearly tight upper and lower bounds for two-sided testing of intersectingness when ε = Θ(1/√n), and for one-sided testing of intersectingness when ε = Θ(1). 
    more » « less
  4. Censor-Hillel, Keren; Grandoni, Fabrizio; Ouaknine, Joel; Puppis, Gabriele (Ed.)
    We study the problem of constructing hypergraph cut sparsifiers in the streaming model where a hypergraph on n vertices is revealed either via an arbitrary sequence of hyperedge insertions alone (insertion-only streaming model) or via an arbitrary sequence of hyperedge insertions and deletions (dynamic streaming model). For any ε ∈ (0,1), a (1 ± ε) hypergraph cut-sparsifier of a hypergraph H is a reweighted subgraph H' whose cut values approximate those of H to within a (1 ± ε) factor. Prior work shows that in the static setting, one can construct a (1 ± ε) hypergraph cut-sparsifier using Õ(nr/ε²) bits of space [Chen-Khanna-Nagda FOCS 2020], and in the setting of dynamic streams using Õ(nrlog m/ε²) bits of space [Khanna-Putterman-Sudan FOCS 2024]; here the Õ notation hides terms that are polylogarithmic in n, and we use m to denote the total number of hyperedges in the hypergraph. Up until now, the best known space complexity for insertion-only streams has been the same as that for the dynamic streams. This naturally poses the question of understanding the complexity of hypergraph sparsification in insertion-only streams. Perhaps surprisingly, in this work we show that in insertion-only streams, a (1 ± ε) cut-sparsifier can be computed in Õ(nr/ε²) bits of space, matching the complexity of the static setting. As a consequence, this also establishes an Ω(log m) factor separation between the space complexity of hypergraph cut sparsification in insertion-only streams and dynamic streams, as the latter is provably known to require Ω(nr log m) bits of space. To better explain this gap, we then show a more general result: namely, if the stream has at most k hyperedge deletions then Õ(n r log k/ε²) bits of space suffice for hypergraph cut sparsification. Thus the space complexity smoothly interpolates between the insertion-only regime (k = 0) and the fully dynamic regime (k = m). Our algorithmic results are driven by a key technical insight: once sufficiently many hyperedges have been inserted into the stream (relative to the number of allowed deletions), we can significantly reduce the underlying hypergraph by size by irrevocably contracting large subsets of vertices. Finally, we complement this result with an essentially matching lower bound of Ω(n r log(k/n)) bits, thus providing essentially a tight characterization of the space complexity for hypergraph cut-sparsification across a spectrum of streaming models. 
    more » « less
  5. In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to capture the asymmetric costs, inherent in modern hardware and database systems, of reading versus writing to memory. In fact, most streaming algorithms write to their memory on every update, which is undesirable when writing is significantly more expensive than reading. This raises the question of whether streaming algorithms with small space and number of memory writes are possible. We first demonstrate that, for the fundamental Fpmoment estimation problem with p ≥ 1, any streaming algorithm that achieves a constant factor approximation must make Ω(n1-1/p) internal state changes, regardless of how much space it uses. Perhaps surprisingly, we show that this lower bound can be matched by an algorithm which also has near-optimal space complexity. Specifically, we give a (1+ε)-approximation algorithm for Fpmoment estimation that use a near-optimal ~Oε(n1-1/p) number of state changes, while simultaneously achieving near-optimal space, i.e., for p∈[1,2), our algorithm uses poly(log n,1/ε) bits of space for, while for p>2, the algorithm uses ~Oε(n1-1/p) space. We similarly design streaming algorithms that are simultaneously near-optimal in both space complexity and the number of state changes for the heavy-hitters problem, sparse support recovery, and entropy estimation. Our results demonstrate that an optimal number of state changes can be achieved without sacrificing space complexity. 
    more » « less