skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leibniz International Proceedings in Informatics (LIPIcs):50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary - for many streaming problems, both relaxations must be employed simultaneously, to avoid an exponentially larger (and often trivial) space complexity. A common drawback of these randomized approximate algorithms is that independent executions on the same input have different outputs, that depend on their random coins. Pseudo-deterministic algorithms combat this issue, and for every input, they output with high probability the same "canonical" solution. We consider perhaps the most basic problem in data streams, of counting the number of items in a stream of length at most n. Morris’s counter [CACM, 1978] is a randomized approximation algorithm for this problem that uses O(log log n) bits of space, for every fixed approximation factor (greater than 1). Goldwasser, Grossman, Mohanty and Woodruff [ITCS 2020] asked whether pseudo-deterministic approximation algorithms can match this space complexity. Our main result answers their question negatively, and shows that such algorithms must use Ω(√{log n / log log n}) bits of space. Our approach is based on a problem that we call Shift Finding, and may be of independent interest. In this problem, one has query access to a shifted version of a known string F ∈ {0,1}^{3n}, which is guaranteed to start with n zeros and end with n ones, and the goal is to find the unknown shift using a small number of queries. We provide for this problem an algorithm that uses O(√n) queries. It remains open whether poly(log n) queries suffice; if true, then our techniques immediately imply a nearly-tight Ω(log n/log log n) space bound for pseudo-deterministic approximate counting.  more » « less
Award ID(s):
2244899
PAR ID:
10509610
Author(s) / Creator(s):
; ; ;
Editor(s):
Etessami, Kousha; Feige, Uriel; Puppis, Gabriele
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Subject(s) / Keyword(s):
streaming algorithms pseudo-deterministic approximate counting Theory of computation → Streaming, sublinear and near linear time algorithms Theory of computation → Lower bounds and information complexity Theory of computation → Pseudorandomness and derandomization
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Santhanam, Rahul (Ed.)
    Adversarially robust streaming algorithms are required to process a stream of elements and produce correct outputs, even when each stream element can be chosen as a function of earlier algorithm outputs. As with classic streaming algorithms, which must only be correct for the worst-case fixed stream, adversarially robust algorithms with access to randomness can use significantly less space than deterministic algorithms. We prove that for the Missing Item Finding problem in streaming, the space complexity also significantly depends on how adversarially robust algorithms are permitted to use randomness. (In contrast, the space complexity of classic streaming algorithms does not depend as strongly on the way randomness is used.) For Missing Item Finding on streams of length 𝓁 with elements in {1,…,n}, and ≤ 1/poly(𝓁) error, we show that when 𝓁 = O(2^√{log n}), "random seed" adversarially robust algorithms, which only use randomness at initialization, require 𝓁^Ω(1) bits of space, while "random tape" adversarially robust algorithms, which may make random decisions at any time, may use O(polylog(𝓁)) random bits. When 𝓁 is between n^Ω(1) and O(√n), "random tape" adversarially robust algorithms need 𝓁^Ω(1) space, while "random oracle" adversarially robust algorithms, which can read from a long random string for free, may use O(polylog(𝓁)) space. The space lower bound for the "random seed" case follows, by a reduction given in prior work, from a lower bound for pseudo-deterministic streaming algorithms given in this paper. 
    more » « less
  2. Many problems on data streams have been studied at two extremes of difficulty: either allowing randomized algorithms, in the static setting (where they should err with bounded probability on the worst case stream); or when only deterministic and infallible algorithms are required. Some recent works have considered the adversarial setting, in which a randomized streaming algorithm must succeed even on data streams provided by an adaptive adversary that can see the intermediate outputs of the algorithm. In order to better understand the differences between these models, we study a streaming task called “Missing Item Finding”. In this problem, for r < n, one is given a data stream a1 , . . . , ar of elements in [n], (possibly with repetitions), and must output some x ∈ [n] which does not equal any of the ai. We prove that, for r = nΘ(1) and δ = 1/poly(n), the space required for randomized algorithms that solve this problem in the static setting with error δ is Θ(polylog(n)); for algorithms in the adversarial setting with error δ, Θ((1 + r2/n)polylog(n)); and for deterministic algorithms, Θ(r/polylog(n)). Because our adversarially robust algorithm relies on free access to a string of O(r log n) random bits, we investigate a “random start” model of streaming algorithms where all random bits used are included in the space cost. Here we find a conditional lower bound on the space usage, which depends on the space that would be needed for a pseudo-deterministic algorithm to solve the problem. We also prove an Ω(r/polylog(n)) lower bound for the space needed by a streaming algorithm with < 1/2polylog(n) error against “white-box” adversaries that can see the internal state of the algorithm, but not predict its future random decisions. 
    more » « less
  3. There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a black-box adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for randomized algorithms robust to a white-box adversary. In particular, our results show that for all p ≥ 0, there exists a constant Cp > 1 such that any Cp-approximation algorithm for Fp moment estimation in insertion-only streams with a white-box adversary requires Ω(n) space for a universe of size n. Similarly, there is a constant C > 1 such that any C-approximation algorithm in an insertion-only stream for matrix rank requires Ω(n) space with a white-box adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of Ω(log n) bits for the fundamental problem of deterministic approximate counting in a stream of 0’s and 1’s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the white-box space complexity of a streaming algorithm 
    more » « less
  4. Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (Ed.)
    We consider the problem of query-efficient global max-cut on a weighted undirected graph in the value oracle model examined by [Rubinstein et al., 2018]. Graph algorithms in this cut query model and other query models have recently been studied for various other problems such as min-cut, connectivity, bipartiteness, and triangle detection. Max-cut in the cut query model can also be viewed as a natural special case of submodular function maximization: on query S ⊆ V, the oracle returns the total weight of the cut between S and V\S. Our first main technical result is a lower bound stating that a deterministic algorithm achieving a c-approximation for any c > 1/2 requires Ω(n) queries. This uses an extension of the cut dimension to rule out approximation (prior work of [Graur et al., 2020] introducing the cut dimension only rules out exact solutions). Secondly, we provide a randomized algorithm with Õ(n) queries that finds a c-approximation for any c < 1. We achieve this using a query-efficient sparsifier for undirected weighted graphs (prior work of [Rubinstein et al., 2018] holds only for unweighted graphs). To complement these results, for most constants c ∈ (0,1], we nail down the query complexity of achieving a c-approximation, for both deterministic and randomized algorithms (up to logarithmic factors). Analogously to general submodular function maximization in the same model, we observe a phase transition at c = 1/2: we design a deterministic algorithm for global c-approximate max-cut in O(log n) queries for any c < 1/2, and show that any randomized algorithm requires Ω(n/log n) queries to find a c-approximate max-cut for any c > 1/2. Additionally, we show that any deterministic algorithm requires Ω(n²) queries to find an exact max-cut (enough to learn the entire graph). 
    more » « less
  5. Alistarh, Dan (Ed.)
    This paper studies the power of the "abstract MAC layer" model in a single-hop asynchronous network. The model captures primitive properties of modern wireless MAC protocols. In this model, Newport [PODC '14] proves that it is impossible to achieve deterministic consensus when nodes may crash. Subsequently, Newport and Robinson [DISC '18] present randomized consensus algorithms that terminate with O(n³ log n) expected broadcasts in a system of n nodes. We are not aware of any results on other fault-tolerant distributed tasks in this model. We first study the computability aspect of the abstract MAC layer. We present a wait-free algorithm that implements an atomic register. Furthermore, we show that in general, k-set consensus is impossible. Second, we aim to minimize storage complexity. Existing algorithms require Ω(n log n) bits. We propose two wait-free approximate consensus and two wait-free randomized binary consensus algorithms that only need constant storage complexity (except for the phase index). One randomized algorithm terminates with O(n log n) expected broadcasts. All our algorithms are anonymous, meaning that at the algorithm level, nodes do not need to have a unique identifier. 
    more » « less