We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinearspace deterministic algorithms; on the other hand, classical spaceefficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertiononly model, including distinct elements and more generally F p estimation, F p heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)approximation algorithms whose required space matches that of the best known nonrobust algorithms up to a poly(log n , 1/ε) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a nonrobust streaming algorithm into a robust one in various scenarios.
more »
« less
A Framework for Adversarially Robust Streaming Algorithms
We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinearspace deterministic algorithms; on the other hand, classical spaceefficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems.
more »
« less
 Award ID(s):
 1815840
 NSFPAR ID:
 10330049
 Date Published:
 Journal Name:
 ACM SIGMOD Record
 Volume:
 50
 Issue:
 1
 ISSN:
 01635808
 Page Range / eLocation ID:
 6 to 13
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


A streaming algorithm is considered to be adversarially robust if it provides correct outputs with high probability even when the stream updates are chosen by an adversary who may observe and react to the past outputs of the algorithm. We grow the burgeoning body of work on such algorithms in a new direction by studying robust algorithms for the problem of maintaining a valid vertex coloring of an nvertex graph given as a stream of edges. Following standard practice, we focus on graphs with maximum degree at most Δ and aim for colorings using a small number f(Δ) of colors. A recent breakthrough (Assadi, Chen, and Khanna; SODA 2019) shows that in the standard, nonrobust, streaming setting, (Δ+1)colorings can be obtained while using only Õ(n) space. Here, we prove that an adversarially robust algorithm running under a similar space bound must spend almost Ω(Δ²) colors and that robust O(Δ)coloring requires a linear amount of space, namely Ω(nΔ). We in fact obtain a more general lower bound, trading off the space usage against the number of colors used. From a complexitytheoretic standpoint, these lower bounds provide (i) the first significant separation between adversarially robust algorithms and ordinary randomized algorithms for a natural problem on insertiononly streams and (ii) the first significant separation between randomized and deterministic coloring algorithms for graph streams, since deterministic streaming algorithms are automatically robust. We complement our lower bounds with a suite of positive results, giving adversarially robust coloring algorithms using sublinear space. In particular, we can maintain an O(Δ²)coloring using Õ(n √Δ) space and an O(Δ³)coloring using Õ(n) space.more » « less

Many problems on data streams have been studied at two extremes of difficulty: either allowing randomized algorithms, in the static setting (where they should err with bounded probability on the worst case stream); or when only deterministic and infallible algorithms are required. Some recent works have considered the adversarial setting, in which a randomized streaming algorithm must succeed even on data streams provided by an adaptive adversary that can see the intermediate outputs of the algorithm. In order to better understand the differences between these models, we study a streaming task called “Missing Item Finding”. In this problem, for r < n, one is given a data stream a1 , . . . , ar of elements in [n], (possibly with repetitions), and must output some x ∈ [n] which does not equal any of the ai. We prove that, for r = nΘ(1) and δ = 1/poly(n), the space required for randomized algorithms that solve this problem in the static setting with error δ is Θ(polylog(n)); for algorithms in the adversarial setting with error δ, Θ((1 + r2/n)polylog(n)); and for deterministic algorithms, Θ(r/polylog(n)). Because our adversarially robust algorithm relies on free access to a string of O(r log n) random bits, we investigate a “random start” model of streaming algorithms where all random bits used are included in the space cost. Here we find a conditional lower bound on the space usage, which depends on the space that would be needed for a pseudodeterministic algorithm to solve the problem. We also prove an Ω(r/polylog(n)) lower bound for the space needed by a streaming algorithm with < 1/2polylog(n) error against “whitebox” adversaries that can see the internal state of the algorithm, but not predict its future random decisions.more » « less

Graph coloring is a fundamental problem with wide reaching applications in various areas including ata mining and databases, e.g., in parallel query optimization. In recent years, there has been a growing interest in solving various graph coloring problems in the streaming model. The initial algorithms in this line of work are all crucially randomized, raising natural questions about how important a role randomization plays in streaming graph coloring. A couple of very recent works prove that deterministic or even adversarially robust coloring algorithms (that work on streams whose updates may depend on the algorithm's past outputs) are considerably weaker than standard randomized ones. However, there is still a significant gap between the upper and lower bounds for the number of colors needed (as a function of the maximum degree Δ) for robust coloring and multipass deterministic coloring. We contribute to this line of work by proving the following results. In the deterministic semistreaming (i.e., O(n · polylog n) space) regime, we present an algorithm that achieves a combinatorially optimal (Δ+1)coloring using O(logΔ log logΔ) passes. This improves upon the prior O(Δ)coloring algorithm of Assadi, Chen, and Sun (STOC 2022) at the cost of only an O(log logΔ) factor in the number of passes. In the adversarially robust semistreaming regime, we design an O(Δ5/2)coloring algorithm that improves upon the previously best O(Δ3)coloring algorithm of Chakrabarti, Ghosh, and Stoeckl (ITCS 2022). Further, we obtain a smooth colors/space tradeoff that improves upon another algorithm of the said work: whereas their algorithm uses O(Δ2) colors and O(nΔ1/2) space, ours, in particular, achieves (i)~O(Δ2) colors in O(nΔ1/3) space, and (ii)~O(Δ7/4) colors in O(nΔ1/2) space.more » « less

There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a blackbox adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the whitebox adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1heavy hitters problem that outperforms the optimal deterministic MisraGries algorithm on long streams. If the whitebox adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any twoplayer deterministic communication lower bound to a lower bound for randomized algorithms robust to a whitebox adversary. In particular, our results show that for all p ≥ 0, there exists a constant Cp > 1 such that any Cpapproximation algorithm for Fp moment estimation in insertiononly streams with a whitebox adversary requires Ω(n) space for a universe of size n. Similarly, there is a constant C > 1 such that any Capproximation algorithm in an insertiononly stream for matrix rank requires Ω(n) space with a whitebox adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of Ω(log n) bits for the fundamental problem of deterministic approximate counting in a stream of 0’s and 1’s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the whitebox space complexity of a streaming algorithmmore » « less