We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinearspace deterministic algorithms; on the other hand, classical spaceefficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertiononly model, including distinct elements and more generally F p estimation, F p heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)approximation algorithms whose required space matches that of the best known nonrobust algorithms up to a poly(log n , 1/ε) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a nonrobust streaming algorithm into a robust onemore »
This content will become publicly available on January 1, 2023
The WhiteBox Adversarial Data Stream Model
There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a blackbox adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the whitebox adversarial model, where the stream is
chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1heavy hitters problem that outperforms the optimal deterministic MisraGries algorithm on long streams. If the whitebox adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1heavy hitters algorithm even further
and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with more »
 Award ID(s):
 2022448
 Publication Date:
 NSFPAR ID:
 10341769
 Journal Name:
 Proceedings of the 41st ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems (PODS 2022)
 Page Range or eLocationID:
 1527
 Sponsoring Org:
 National Science Foundation
More Like this


A streaming algorithm is considered to be adversarially robust if it provides correct outputs with high probability even when the stream updates are chosen by an adversary who may observe and react to the past outputs of the algorithm. We grow the burgeoning body of work on such algorithms in a new direction by studying robust algorithms for the problem of maintaining a valid vertex coloring of an nvertex graph given as a stream of edges. Following standard practice, we focus on graphs with maximum degree at most Δ and aim for colorings using a small number f(Δ) of colors. A recent breakthrough (Assadi, Chen, and Khanna; SODA 2019) shows that in the standard, nonrobust, streaming setting, (Δ+1)colorings can be obtained while using only Õ(n) space. Here, we prove that an adversarially robust algorithm running under a similar space bound must spend almost Ω(Δ²) colors and that robust O(Δ)coloring requires a linear amount of space, namely Ω(nΔ). We in fact obtain a more general lower bound, trading off the space usage against the number of colors used. From a complexitytheoretic standpoint, these lower bounds provide (i) the first significant separation between adversarially robust algorithms and ordinary randomized algorithms for amore »

We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinearspace deterministic algorithms; on the other hand, classical spaceefficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems.

Given an input stream S of size N , a ɸheavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavyhitters is extensively studied in the database literature. We study a realtime heavyhitters variant in which an element must be reported shortly after we see its T = ɸ Nth occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection ( TED ) Problem. The TED problem models the needs of many realworld monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, highspeed streams with a low reporting threshold (high sensitivity). Like the classic heavyhitters problem, solving the TED problem without falsepositives requires large space (Ω (N) words). Thus inRAM heavyhitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavyhitters algorithms to external memory to solve the TED problem on large highspeed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/Obandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, ourmore »

We revisit the muchstudied problem of spaceefficiently estimating the number of triangles in a graph stream, and extensions of this problem to counting fixedsized cliques and cycles, obtaining a number of new upper and lower bounds. For the important special case of counting triangles, we give a $4$pass, $(1\pm\varepsilon)$approximate, randomized algorithm that needs at most $\widetilde{O}(\varepsilon^{2}\cdot m^{3/2}/T)$ space, where $m$ is the number of edges and $T$ is a promised lower bound on the number of triangles. This matches the space bound of a very recent algorithm (McGregor et al., PODS 2016), with an arguably simpler and more general technique. We give an improved multipass lower bound of $\Omega(\min\{m^{3/2}/T, m/\sqrt{T}\})$, applicable at essentially all densities $\Omega(n) \le m \le O(n^2)$. We also prove other multipass lower bounds in terms of various structural parameters of the input graph. Together, our results resolve a couple of open questions raised in recent work (Braverman et al., ICALP 2013). Our presentation emphasizes more general frameworks, for both upper and lower bounds. We give a sampling algorithm for counting arbitrary subgraphs and then improve it via combinatorial means in the special cases of counting odd cliques and odd cycles. Our results show that these problemsmore »