skip to main content


Title: Mitigating False Positives in Filters: to Adapt or to Cache?
A filter is adaptive if it achieves a false positive rate of " on each query independently of the answers to previous queries. Many popular filters such as Bloom filters are not adaptive—an adversary could repeat a false-positive query many times to drive the false-positive rate to 1. Bender et al. [4] formalized the definition of adaptivity and gave a provably adaptive filter, the broom filter. Mitzenmacher et al. [20] gave a filter that achieves a lower empirical false- positive rate by exploiting repetitions. We prove that an adaptive filter has a lower false- positive rate when the adversary is stochastic. Specifically, we analyze the broom filter against queries drawn from a Zipfian distribution. We validate our analysis empirically by showing that the broom filter achieves a low false-positive rate on both network traces and synthetic datasets, even when compared to a regular filter augmented with a cache for storing frequently queried items.  more » « less
Award ID(s):
1938180 2106999 2118620
NSF-PAR ID:
10298510
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
SIAM Symposium on Algorithmic Principles of Computer System
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries: open-range queries, closed-range queries, and range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in SuRF for both point and range queries are tunable to satisfy different application needs. We evaluate SuRF in RocksDB as a replacement for its Bloom filters to reduce I/O by filtering requests before they access on-disk data structures. Our experiments on a 100 GB dataset show that replacing RocksDB's Bloom filters with SuRFs speeds up open-seek (without upper-bound) and closed-seek (with upper-bound) queries by up to 1.5× and 5× with a modest cost on the worst-case (all-missing) point query throughput due to slightly higher false positive rate. 
    more » « less
  2. The problem of testing monotonicity for Boolean functions on the hypergrid, $f:[n]^d \to \{0,1\}$ is a classic topic in property testing. When $n=2$, the domain is the hypercube. For the hypercube case, a breakthrough result of Khot-Minzer-Safra (FOCS 2015) gave a non-adaptive, one-sided tester making $\otilde(\eps^{-2}\sqrt{d})$ queries. Up to polylog $d$ and $\eps$ factors, this bound matches the $\widetilde{\Omega}(\sqrt{d})$-query non-adaptive lower bound (Chen-De-Servedio-Tan (STOC 2015), Chen-Waingarten-Xie (STOC 2017)). For any $n > 2$, the optimal non-adaptive complexity was unknown. A previous result of the authors achieves a $\otilde(d^{5/6})$-query upper bound (SODA 2020), quite far from the $\sqrt{d}$ bound for the hypercube. In this paper, we resolve the non-adaptive complexity of monotonicity testing for all constant $n$, up to $\poly(\eps^{-1}\log d)$ factors. Specifically, we give a non-adaptive, one-sided monotonicity tester making $\otilde(\eps^{-2}n\sqrt{d})$ queries. From a technical standpoint, we prove new directed isoperimetric theorems over the hypergrid $[n]^d$. These results generalize the celebrated directed Talagrand inequalities that were only known for the hypercube. 
    more » « less
  3. We study the problem of estimating the value of sums of the form Sp≜∑(xip) when one has the ability to sample xi≥0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1±ε)-multiplicative approximation of Sp has query and time complexities O(mloglognϵ2S1/pp). Here, m=∑xi/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where Sp≤n, and p=2, our upper bound is O~(n/S1/2p), in contrast to their Ω(n/S1/3p) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case. 
    more » « less
  4. We present TVA, a multi-party computation (MPC) system for secure analytics on secret-shared time series data. TVA achieves strong security guarantees in the semi-honest and malicious settings, and high expressivity by enabling complex analytics on inputs with unordered and irregular timestamps. TVA is the first system to support arbitrary composition of oblivious window operators, keyed aggregations, and multiple filter predicates, while keeping all data attributes private, including record timestamps and user-defined values in query predicates. At the core of the TVA system lie novel protocols for secure window assignment: (i) a tumbling window protocol that groups records into fixed-length time buckets and (ii) two session window protocols that identify periods of activity followed by periods of inactivity. We also contribute a new protocol for secure division with a public divisor, which may be of independent interest. We evaluate TVA on real LAN and WAN environments and show that it can efficiently compute complex window-based analytics on inputs of 2^22 records with modest use of resources. When compared to the state-of-the-art, TVA achieves up to 5.8× lower latency in queries with multiple filters and two orders of magnitude better performance in window aggregation. 
    more » « less
  5. Pass, Rafael ; Pietrzak, Krzysztof (Ed.)
    In the backdoored random-oracle (BRO) model, besides access to a random function H , adversaries are provided with a backdoor oracle that can compute arbitrary leakage functions f of the function table of H . Thus, an adversary would be able to invert points, find collisions, test for membership in certain sets, and more. This model was introduced in the work of Bauer, Farshim, and Mazaheri (Crypto 2018) and extends the auxiliary-input idealized models of Unruh (Crypto 2007), Dodis, Guo, and Katz (Eurocrypt 2017), Coretti et al. (Eurocrypt 2018), and Coretti, Dodis, and Guo (Crypto 2018). It was shown that certain security properties, such as one-wayness, pseudorandomness, and collision resistance can be re-established by combining two independent BROs, even if the adversary has access to both backdoor oracles. In this work we further develop the technique of combining two or more independent BROs to render their backdoors useless in a more general sense. More precisely, we study the question of building an indifferentiable and backdoor-free random function by combining multiple BROs. Achieving full indifferentiability in this model seems very challenging at the moment. We however make progress by showing that the xor combiner goes well beyond security against preprocessing attacks and offers indifferentiability as long as the adaptivity of queries to different backdoor oracles remains logarithmic in the input size of the BROs. We even show that an extractor-based combiner of three BROs can achieve indifferentiability with respect to a linear adaptivity of backdoor queries. Furthermore, a natural restriction of our definition gives rise to a notion of indifferentiability with auxiliary input, for which we give two positive feasibility results. To prove these results we build on and refine techniques by Göös et al. (STOC 2015) and Kothari et al. (STOC 2017) for decomposing distributions with high entropy into distributions with more structure and show how they can be applied in the more involved adaptive settings. 
    more » « less