skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Vacuum filters: more space-efficient and faster replacement for bloom and cuckoo filters
Award ID(s):
1750704
PAR ID:
10159886
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
ISSN:
2150-8097
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We address the question of consistency strength of certain filters and ultrafilters which fail to satisfy the Galvin property. We answer questions [Benhamou and Gitik, Ann. Pure Appl. Logic 173 (2022) 103107; Questions 7.8, 7.9], [Benhamou et al., J. Lond. Math. Soc. 108(1) (2023) 190–237; Question 5] and improve theorem [Benhamou et al., J. Lond. Math. Soc. 108(1) (2023) 190–237; Theorem 2.3]. 
    more » « less
  2. Filters trade off accuracy for space and occasionally return false positive matches with a bounded error. Numerous systems use filters in fast memory to avoid performing expensive I/Os to slow storage. A fundamental limitation in traditional filters is that they do not change their representation upon seeing a false positive match. Therefore, the maximum false positive rate is only guaranteed for a single query, not for an arbitrary set of queries. We can improve the filter's performance on a stream of queries, especially on a skewed distribution, if we can adapt after encountering false positives. Adaptive filters, such as telescoping quotient filters and adaptive cuckoo filters, update their representation upon detecting a false positive to avoid repeating the same error in the future. Adaptive filters require an auxiliary structure, typically much larger than the main filter and often residing on slow storage, to facilitate adaptation. However, existing adaptive filters are not practical and have not been adopted in real-world systems for two main reasons. First, they offer weak adaptivity guarantees, meaning that fixing a new false positive can cause a previously fixed false positive to come back. Secondly, the sub-optimal design of the auxiliary structure results in adaptivity overheads so substantial that they can actually diminish overall system performance compared to a traditional filter. In this paper, we design and implement the \sysname, the first practical adaptive filter with minimal adaptivity overhead and strong adaptivity guarantees, which means that the performance and false-positive guarantees continue to hold even for adversarial workloads. The \sysname is based on the state-of-the-art quotient filter design and preserves all the critical features of the quotient filter such as cache efficiency and mergeability. Furthermore, we employ a new auxiliary structure design which results in considerably low adaptivity overhead and makes the \sysname practical in real systems. We evaluate the \sysname by using it to filter queries to an on-disk B-tree database and find no negative impact on insert or query performance compared to traditional filters. Against adversarial workloads, the \sysname preserves system performance, whereas traditional filters incur 2× slowdown from adversaries representing as low as 1% of the workload. Finally, we show that on skewed query workloads, the \sysname can reduce the false-positive rate 100× using negligible (1/1000th of a bit per item) space overhead. 
    more » « less
  3. Abstract In this paper we present the derivation of two new forms of the Kalman filter equations; the first is for a pure lognormally distributed random variable, while the second set of Kalman filter equations will be for a combination of Gaussian and lognormally distributed random variables. We show that the appearance is similar to that of the Gaussian-based equations, but that the analysis state is a multivariate median and not the mean. We also show results of the mixed distribution Kalman filter with the Lorenz 1963 model with lognormal errors for the background and observations of the z component, and compare them to analysis results from a traditional Gaussian-based extended Kalman filter and show that under certain circumstances the new approach produces more accurate results. 
    more » « less
  4. null (Ed.)