skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Separations and equivalences between turnstile streaming and linear sketching
A longstanding observation, which was partially proven in \cite{LNW14,AHLW16}, is that any turnstile streaming algorithm can be implemented as a linear sketch (the reverse is trivially true). We study the relationship between turnstile streaming and linear sketching algorithms in more detail, giving both new separations and new equivalences between the two models. It was shown in \cite{LNW14} that, if a turnstile algorithm works for arbitrarily long streams with arbitrarily large coordinates at intermediate stages of the stream, then the turnstile algorithm is equivalent to a linear sketch. We show separations of the opposite form: if either the stream length or the maximum value of the stream are substantially restricted, there exist problems where linear sketching is exponentially harder than turnstile streaming. A further limitation of the \cite{LNW14} equivalence is that the turnstile sketching algorithm is neither explicit nor uniform, but requires an exponentially long advice string. We show how to remove this limitation for deterministic streaming algorithms: we give an explicit small-space algorithm that takes the streaming algorithm and computes an equivalent module.  more » « less
Award ID(s):
1751040
PAR ID:
10149455
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Conference proceedings of the annual ACM Symposium on Theory of Computing
ISSN:
0734-9025
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We initiate a systematic study of linear sketching over F_2. For a given Boolean function treated as f : F_2^n -> F_2 a randomized F_2-sketch is a distribution M over d x n matrices with elements over F_2 such that Mx suffices for computing f(x) with high probability. Such sketches for d << n can be used to design small-space distributed and streaming algorithms. Motivated by these applications we study a connection between F_2-sketching and a two-player one-way communication game for the corresponding XOR-function. We conjecture that F_2-sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) low-degree F_2-polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that F_2-sketching is optimal (up to constant factors) for uniformly distributed inputs. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over F_2 can be constructed as F_2-sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require the stream length to be triply exponential in n and holds for streams of length O(n) constructed through uniformly random updates. 
    more » « less
  2. We initiate a systematic study of linear sketching over $$\ftwo$$. For a given Boolean function treated as $$f \colon \ftwo^n \to \ftwo$$ a randomized $$\ftwo$$-sketch is a distribution $$\mathcal M$$ over $$d \times n$$ matrices with elements over $$\ftwo$$ such that $$\mathcal Mx$$ suffices for computing $f(x)$ with high probability. Such sketches for $$d \ll n$$ can be used to design small-space distributed and streaming algorithms. Motivated by these applications we study a connection between $$\ftwo$$-sketching and a two-player one-way communication game for the corresponding XOR-function. We conjecture that $$\ftwo$$-sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) low-degree $$\ftwo$$-polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that $$\ftwo$$-sketching is optimal (up to constant factors) for uniformly distributed inputs. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over $$\ftwo$$ can be constructed as $$\ftwo$$-sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require the stream length to be triply exponential in $$n$$ and holds for streams of length $$\tilde O(n)$$ constructed through uniformly random updates. 
    more » « less
  3. We introduce a new sub-linear space sketch the "Weight-Median Sketch" for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis that establishes recovery guarantees for batch and online learning, and demonstrate empirical improvements in memory-accuracy trade-offs over alternative memory-budgeted methods, including count-based sketches and feature hashing. 
    more » « less
  4. Guruswami, Venkatesan (Ed.)
    We present novel lower bounds in the Merlin-Arthur (MA) communication model and the related annotated streaming or stream verification model. The MA communication model extends the classical communication model by introducing an all-powerful but untrusted player, Merlin, who knows the inputs of the usual players, Alice and Bob, and attempts to convince them about the output. We focus on the online MA (OMA) model where Alice and Merlin each send a single message to Bob, who needs to catch Merlin if he is dishonest and announce the correct output otherwise. Most known functions have OMA protocols with total communication significantly smaller than what would be needed without Merlin. In this work, we introduce the notion of non-trivial-OMA complexity of a function. This is the minimum total communication required when we restrict ourselves to only non-trivial protocols where Alice sends Bob fewer bits than what she would have sent without Merlin. We exhibit the first explicit functions that have this complexity superlinear - even exponential - in their classical one-way complexity: this means the trivial protocol, where Merlin communicates nothing and Alice and Bob compute the function on their own, is exponentially better than any non-trivial protocol in terms of total communication. These OMA lower bounds also translate to the annotated streaming model, the MA analogue of single-pass data streaming. We show large separations between the classical streaming complexity and the non-trivial annotated streaming complexity (for the analogous notion in this setting) of fundamental problems such as counting distinct items, as well as of graph problems such as connectivity and k-connectivity in a certain edge update model called the support graph turnstile model that we introduce here. 
    more » « less
  5. There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a black-box adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for randomized algorithms robust to a white-box adversary. In particular, our results show that for all p ≥ 0, there exists a constant Cp > 1 such that any Cp-approximation algorithm for Fp moment estimation in insertion-only streams with a white-box adversary requires Ω(n) space for a universe of size n. Similarly, there is a constant C > 1 such that any C-approximation algorithm in an insertion-only stream for matrix rank requires Ω(n) space with a white-box adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of Ω(log n) bits for the fundamental problem of deterministic approximate counting in a stream of 0’s and 1’s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the white-box space complexity of a streaming algorithm 
    more » « less