skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Streaming Algorithms with Few State Changes
In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to capture the asymmetric costs, inherent in modern hardware and database systems, of reading versus writing to memory. In fact, most streaming algorithms write to their memory on every update, which is undesirable when writing is significantly more expensive than reading. This raises the question of whether streaming algorithms with small space and number of memory writes are possible. We first demonstrate that, for the fundamental Fpmoment estimation problem with p ≥ 1, any streaming algorithm that achieves a constant factor approximation must make Ω(n1-1/p) internal state changes, regardless of how much space it uses. Perhaps surprisingly, we show that this lower bound can be matched by an algorithm which also has near-optimal space complexity. Specifically, we give a (1+ε)-approximation algorithm for Fpmoment estimation that use a near-optimal ~Oε(n1-1/p) number of state changes, while simultaneously achieving near-optimal space, i.e., for p∈[1,2), our algorithm uses poly(log n,1/ε) bits of space for, while for p>2, the algorithm uses ~Oε(n1-1/p) space. We similarly design streaming algorithms that are simultaneously near-optimal in both space complexity and the number of state changes for the heavy-hitters problem, sparse support recovery, and entropy estimation. Our results demonstrate that an optimal number of state changes can be achieved without sacrificing space complexity.  more » « less
Award ID(s):
2335411
PAR ID:
10560522
Author(s) / Creator(s):
; ;
Publisher / Repository:
Proc. ACM Manag. Data
Date Published:
Journal Name:
Proceedings of the ACM on Management of Data
Volume:
2
Issue:
2
ISSN:
2836-6573
Page Range / eLocation ID:
1 to 28
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Meka, Raghu (Ed.)
    We consider the problem of finding a minimum cut of a weighted graph presented as a single-pass stream. While graph sparsification in streams has been intensively studied, the specific application of finding minimum cuts in streams is less well-studied. To this end, we show upper and lower bounds on minimum cut problems in insertion-only streams for a variety of settings, including for both randomized and deterministic algorithms, for both arbitrary and random order streams, and for both approximate and exact algorithms. One of our main results is an Õ(n/ε) space algorithm with fast update time for approximating a spectral cut query with high probability on a stream given in an arbitrary order. Our result breaks the Ω(n/ε²) space lower bound required of a sparsifier that approximates all cuts simultaneously. Using this result, we provide streaming algorithms with near optimal space of Õ(n/ε) for minimum cut and approximate all-pairs effective resistances, with matching space lower-bounds. The amortized update time of our algorithms is Õ(1), provided that the number of edges in the input graph is at least (n/ε²)^{1+o(1)}. We also give a generic way of incorporating sketching into a recursive contraction algorithm to improve the post-processing time of our algorithms. In addition to these results, we give a random-order streaming algorithm that computes the exact minimum cut on a simple, unweighted graph using Õ(n) space. Finally, we give an Ω(n/ε²) space lower bound for deterministic minimum cut algorithms which matches the best-known upper bound up to polylogarithmic factors. 
    more » « less
  2. Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21). 
    more » « less
  3. Kumar, Amit; Ron-Zewi, Noga (Ed.)
    Estimating the size of the union of a stream of sets S₁, S₂, …, S_M where each set is a subset of a known universe Ω is a fundamental problem in data streaming. This problem naturally generalizes the well-studied 𝖥₀ estimation problem in the streaming literature, where each set contains a single element from the universe. We consider the general case when the sets S_i can be succinctly represented and allow efficient membership, cardinality, and sampling queries (called a Delphic family of sets). A notable example in this framework is the Klee’s Measure Problem (KMP), where every set S_i is an axis-parallel rectangle in d-dimensional spaces (Ω = [Δ]^d where [Δ] := {1, … ,Δ} and Δ ∈ ℕ). Recently, Meel, Chakraborty, and Vinodchandran (PODS-21, PODS-22) designed a streaming algorithm for (ε,δ)-estimation of the size of the union of set streams over Delphic family with space and update time complexity O((log³|Ω|)/ε² ⋅ log 1/δ) and Õ((log⁴|Ω|)/ε² ⋅ log 1/(δ)), respectively. This work presents a new, sampling-based algorithm for estimating the size of the union of Delphic sets that has space and update time complexity Õ((log²|Ω|)/ε² ⋅ log 1/(δ)). This improves the space complexity bound by a log|Ω| factor and update time complexity bound by a log² |Ω| factor. A critical question is whether quadratic dependence of log|Ω| on space and update time complexities is necessary. Specifically, can we design a streaming algorithm for estimating the size of the union of sets over Delphic family with space and complexity linear in log|Ω| and update time poly(log|Ω|)? While this appears technically challenging, we show that establishing a lower bound of ω(log|Ω|) with poly(log|Ω|) update time is beyond the reach of current techniques. Specifically, we show that under certain hard-to-prove computational complexity hypothesis, there is a streaming algorithm for the problem with optimal space complexity O(log|Ω|) and update time poly(log(|Ω|)). Thus, establishing a space lower bound of ω(log|Ω|) will lead to break-through complexity class separation results. 
    more » « less
  4. We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally F p -estimation, F p -heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a poly(log n , 1/ε) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios. 
    more » « less
  5. Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)
    The maximum coverage problem is to select k sets, from a collection of m sets, such that the cardinality of their union, in a universe of size n, is maximized. We consider (1-1/e-ε)-approximation algorithms for this NP-hard problem in three standard data stream models. 1) Dynamic Model. The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses ε^{-2} k ⋅ polylog(n,m) space. The best previous result (Assadi and Khanna, SODA 2018) used (n +ε^{-4} k) polylog(n,m) space. While both algorithms use O(ε^{-1} log m) passes, our analysis shows that, when ε ≤ 1/log log m, it is possible to reduce the number of passes by a 1/log log m factor without incurring additional space. 2) Random Order Model. In this model, there are no deletions, and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and k polylog(n,m) space suffices for arbitrary small constant ε. The best previous result, by Warneke et al. (ESA 2023), used k² polylog(n,m) space. 3) Insert-Only Model. Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: polylog(m,n) update time suffices, whereas the best previous result by Jaud et al. (SEA 2023) required update time that was linear in k. 
    more » « less