skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 2 until 12:00 AM ET on Saturday, May 3 due to maintenance. We apologize for the inconvenience.


Title: Linear Sketching over F2
We initiate a systematic study of linear sketching over $$\ftwo$$. For a given Boolean function treated as $$f \colon \ftwo^n \to \ftwo$$ a randomized $$\ftwo$$-sketch is a distribution $$\mathcal M$$ over $$d \times n$$ matrices with elements over $$\ftwo$$ such that $$\mathcal Mx$$ suffices for computing $f(x)$ with high probability. Such sketches for $$d \ll n$$ can be used to design small-space distributed and streaming algorithms. Motivated by these applications we study a connection between $$\ftwo$$-sketching and a two-player one-way communication game for the corresponding XOR-function. We conjecture that $$\ftwo$$-sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) low-degree $$\ftwo$$-polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that $$\ftwo$$-sketching is optimal (up to constant factors) for uniformly distributed inputs. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over $$\ftwo$$ can be constructed as $$\ftwo$$-sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require the stream length to be triply exponential in $$n$$ and holds for streams of length $$\tilde O(n)$$ constructed through uniformly random updates.  more » « less
Award ID(s):
1657477
PAR ID:
10088160
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
33rd Computational Complexity Conference, (CCC 2018)
Volume:
102
Page Range / eLocation ID:
8:1--8:37
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We initiate a systematic study of linear sketching over F_2. For a given Boolean function treated as f : F_2^n -> F_2 a randomized F_2-sketch is a distribution M over d x n matrices with elements over F_2 such that Mx suffices for computing f(x) with high probability. Such sketches for d << n can be used to design small-space distributed and streaming algorithms. Motivated by these applications we study a connection between F_2-sketching and a two-player one-way communication game for the corresponding XOR-function. We conjecture that F_2-sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) low-degree F_2-polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that F_2-sketching is optimal (up to constant factors) for uniformly distributed inputs. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over F_2 can be constructed as F_2-sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require the stream length to be triply exponential in n and holds for streams of length O(n) constructed through uniformly random updates. 
    more » « less
  2. Chakrabarti, Amit; Swamy, Chaitanya (Ed.)
    A Boolean maximum constraint satisfaction problem, Max-CSP(f), is specified by a predicate f:{-1,1}^k → {0,1}. An n-variable instance of Max-CSP(f) consists of a list of constraints, each of which applies f to k distinct literals drawn from the n variables. For k = 2, Chou, Golovnev, and Velusamy [Chou et al., 2020] obtained explicit ratios characterizing the √ n-space streaming approximability of every predicate. For k ≥ 3, Chou, Golovnev, Sudan, and Velusamy [Chou et al., 2022] proved a general dichotomy theorem for √ n-space sketching algorithms: For every f, there exists α(f) ∈ (0,1] such that for every ε > 0, Max-CSP(f) is (α(f)-ε)-approximable by an O(log n)-space linear sketching algorithm, but (α(f)+ε)-approximation sketching algorithms require Ω(√n) space. In this work, we give closed-form expressions for the sketching approximation ratios of multiple families of symmetric Boolean functions. Letting α'_k = 2^{-(k-1)} (1-k^{-2})^{(k-1)/2}, we show that for odd k ≥ 3, α(kAND) = α'_k, and for even k ≥ 2, α(kAND) = 2α'_{k+1}. Thus, for every k, kAND can be (2-o(1))2^{-k}-approximated by O(log n)-space sketching algorithms; we contrast this with a lower bound of Chou, Golovnev, Sudan, Velingker, and Velusamy [Chou et al., 2022] implying that streaming (2+ε)2^{-k}-approximations require Ω(n) space! We also resolve the ratio for the "at-least-(k-1)-1’s" function for all even k; the "exactly-(k+1)/2-1’s" function for odd k ∈ {3,…,51}; and fifteen other functions. We stress here that for general f, the dichotomy theorem in [Chou et al., 2022] only implies that α(f) can be computed to arbitrary precision in PSPACE, and thus closed-form expressions need not have existed a priori. Our analyses involve identifying and exploiting structural "saddle-point" properties of this dichotomy. Separately, for all threshold functions, we give optimal "bias-based" approximation algorithms generalizing [Chou et al., 2020] while simplifying [Chou et al., 2022]. Finally, we investigate the √ n-space streaming lower bounds in [Chou et al., 2022], and show that they are incomplete for 3AND, i.e., they fail to rule out (α(3AND})-ε)-approximations in o(√ n) space. 
    more » « less
  3. A constraint satisfaction problem (CSP),\(\textsf {Max-CSP}(\mathcal {F})\), is specified by a finite set of constraints\(\mathcal {F}\subseteq \lbrace [q]^k \rightarrow \lbrace 0,1\rbrace \rbrace\)for positive integersqandk. An instance of the problem onnvariables is given bymapplications of constraints from\(\mathcal {F}\)to subsequences of thenvariables, and the goal is to find an assignment to the variables that satisfies the maximum number of constraints. In the (γ ,β)-approximation version of the problem for parameters 0 ≤ β ≤ γ ≤ 1, the goal is to distinguish instances where at least γ fraction of the constraints can be satisfied from instances where at most β fraction of the constraints can be satisfied. In this work, we consider the approximability of this problem in the context of sketching algorithms and give a dichotomy result. Specifically, for every family\(\mathcal {F}\)and every β < γ, we show that either a linear sketching algorithm solves the problem in polylogarithmic space or the problem is not solvable by any sketching algorithm in\(o(\sqrt {n})\)space. In particular, we give non-trivial approximation algorithms using polylogarithmic space for infinitely many constraint satisfaction problems. We also extend previously known lower bounds for general streaming algorithms to a wide variety of problems, and in particular the case ofq=k=2, where we get a dichotomy, and the case when the satisfying assignments of the constraints of\(\mathcal {F}\)support a distribution on\([q]^k\)with uniform marginals. Prior to this work, other than sporadic examples, the only systematic classes of CSPs that were analyzed considered the setting of Boolean variablesq= 2, binary constraintsk=2, and singleton families\(|\mathcal {F}|=1\)and only considered the setting where constraints are placed on literals rather than variables. Our positive results show wide applicability of bias-based algorithms used previously by [47] and [41], which we extend to include richer norm estimation algorithms, by giving a systematic way to discover biases. Our negative results combine the Fourier analytic methods of [56], which we extend to a wider class of CSPs, with a rich collection of reductions among communication complexity problems that lie at the heart of the negative results. In particular, previous works used Fourier analysis over the Boolean cube to initiate their results and the results seemed particularly tailored to functions on Boolean literals (i.e., with negations). Our techniques surprisingly allow us to get to generalq-ary CSPs without negations by appealing to the same Fourier analytic starting point over Boolean hypercubes. 
    more » « less
  4. We introduce a new sub-linear space sketch the "Weight-Median Sketch" for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis that establishes recovery guarantees for batch and online learning, and demonstrate empirical improvements in memory-accuracy trade-offs over alternative memory-budgeted methods, including count-based sketches and feature hashing. 
    more » « less
  5. A longstanding observation, which was partially proven in \cite{LNW14,AHLW16}, is that any turnstile streaming algorithm can be implemented as a linear sketch (the reverse is trivially true). We study the relationship between turnstile streaming and linear sketching algorithms in more detail, giving both new separations and new equivalences between the two models. It was shown in \cite{LNW14} that, if a turnstile algorithm works for arbitrarily long streams with arbitrarily large coordinates at intermediate stages of the stream, then the turnstile algorithm is equivalent to a linear sketch. We show separations of the opposite form: if either the stream length or the maximum value of the stream are substantially restricted, there exist problems where linear sketching is exponentially harder than turnstile streaming. A further limitation of the \cite{LNW14} equivalence is that the turnstile sketching algorithm is neither explicit nor uniform, but requires an exponentially long advice string. We show how to remove this limitation for deterministic streaming algorithms: we give an explicit small-space algorithm that takes the streaming algorithm and computes an equivalent module. 
    more » « less