skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Space- and computationally-efficient set reconciliation via parity bitmap sketch (PBS)
Set reconciliation is a fundamental algorithmic problem that arises in many networking, system, and database applications. In this problem, two large sets A and B of objects (bitcoins, files, records, etc.) are stored respectively at two different network-connected hosts, which we name Alice and Bob respectively. Alice and Bob communicate with each other to learn A Δ B , the difference between A and B , and as a result the reconciled set A ∪ B. Current set reconciliation schemes are based on either invertible Bloom filters (IBF) or error-correction codes (ECC). The former has a low computational complexity of O(d) , where d is the cardinality of A Δ B , but has a high communication overhead that is several times larger than the theoretical minimum. The latter has a low communication overhead close to the theoretical minimum, but has a much higher computational complexity of O(d 2 ). In this work, we propose Parity Bitmap Sketch (PBS), an ECC-based set reconciliation scheme that gets the better of both worlds: PBS has both a low computational complexity of O(d) just like IBF-based solutions and a low communication overhead of roughly twice the theoretical minimum. A separate contribution of this work is a novel rigorous analytical framework that can be used for the precise calculation of various performance metrics and for the near-optimal parameter tuning of PBS.  more » « less
Award ID(s):
2007006 1909048
NSF-PAR ID:
10296507
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
14
Issue:
4
ISSN:
2150-8097
Page Range / eLocation ID:
458 to 470
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players, Bob, his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result each time before the next piece is revealed to Bob. This model has a closer and more natural correspondence to dynamic data structures than classic communication models do, and hence presents a new perspective on data structures. We first present a tight lower bound for the online set intersection problem in the online communication model, demonstrating a general approach for proving online communication lower bounds. The online communication model prevents a batching trick that classic communication complexity allows, and yields a stronger lower bound. We then apply the online communication model to prove data structure lower bounds for two dynamic data structure problems: the Group Range problem and the Dynamic Connectivity problem for forests. Both of the problems admit a worst case O(logn)-time data structure. Using online communication complexity, we prove a tight cell-probe lower bound for each: spending o(logn) (even amortized) time per operation results in at best an exp(−δ2 n) probability of correctly answering a (1/2+δ)-fraction of the n queries. 
    more » « less
  2. We study the role of interaction in the Common Randomness Generation (CRG) and Secret Key Generation (SKG) problems. In the CRG problem, two players, Alice and Bob, respectively get samples X1, X2, . . . and Y1, Y2, . . . with the pairs (X1, Y1), (X2, Y2), . . . being drawn independently from some known probability distribution μ. They wish to communicate so as to agree on L bits of randomness. The SKG problem is the restriction of the CRG problem to the case where the key is required to be close to random even to an eavesdropper who can listen to their communication (but does not have access to the inputs of Alice and Bob). In this work, we study the relationship between the amount of communication and the number of rounds of interaction in both the CRG and the SKG problems. Specifically, we construct a family of distributions μ = μr,n,L, parametrized by integers r, n and L, such that for every r there exists a constant b = b(r) for which CRG (respectively SKG) is feasible when (Xi, Yi) ~ μr,n,L with r + 1 rounds of communication, each consisting of O(log n) bits, but when restricted to r/2 − 2 rounds of interaction, the total communication must exceed Ω(n/ logb(n)) bits. Prior to our work no separations were known for r ≥ 2. 
    more » « less
  3. Modulation Classification (MC) is the problem of classifying the modulation format of a wireless signal. In the wireless communications pipeline, MC is the first operation performed on the received signal and is critical for reliable decoding. This paper considers the problem of secure MC, where a transmitter (Alice) wants to maximize MC accuracy at a legitimate receiver (Bob) while minimizing MC accuracy at an eavesdropper (Eve). This work introduces novel adversarial learning techniques for secure MC. We present adversarial filters in which Alice uses a carefully designed adversarial filter to mask the transmitted signal, that can maximize MC accuracy at Bob while minimizing MC accuracy at Eve. We present two filtering-based algorithms, namely gradient ascent filter (GAF), and a fast gradient filter method (FGFM), with varying levels of complexity. Our proposed adversarial filtering-based approaches significantly outperform additive adversarial perturbations (used in the traditional machine learning (ML) community and other prior works on secure MC) and have several other desirable properties. In particular, GAF and FGFM algorithms are a) computational efficient (allow fast decoding at Bob), b) power-efficient (do not require excessive transmit power at Alice); and c) SNR efficient (i.e., perform well even at low SNR values at Bob). 
    more » « less
  4. We consider a scenario wherein two parties Alice and Bob are provided X1 and X2 – samples that are IID from a PMF P_X1X2. Alice and Bob can communicate to Charles over (noiseless) communication links of rate R1 and R2 respectively. Their goal is to enable Charles generate samples Y such that the triple (X1,X2,Y) has a PMF that is close, in total variation, to P_X1X2Y. In addition, the three parties may posses shared common randomness at rate C. We address the problem of characterizing the set of rate triples (R1, R2, C) for which the above goal can be accomplished. We provide a set of sufficient conditions, i.e., an achievable rate region for this three party setup. Our work also provides a complete characterization of a point-to-point setup wherein Bob is absent and Charles is provided with side-information. 
    more » « less
  5. Guruswami, Venkatesan (Ed.)
    We present novel lower bounds in the Merlin-Arthur (MA) communication model and the related annotated streaming or stream verification model. The MA communication model extends the classical communication model by introducing an all-powerful but untrusted player, Merlin, who knows the inputs of the usual players, Alice and Bob, and attempts to convince them about the output. We focus on the online MA (OMA) model where Alice and Merlin each send a single message to Bob, who needs to catch Merlin if he is dishonest and announce the correct output otherwise. Most known functions have OMA protocols with total communication significantly smaller than what would be needed without Merlin. In this work, we introduce the notion of non-trivial-OMA complexity of a function. This is the minimum total communication required when we restrict ourselves to only non-trivial protocols where Alice sends Bob fewer bits than what she would have sent without Merlin. We exhibit the first explicit functions that have this complexity superlinear - even exponential - in their classical one-way complexity: this means the trivial protocol, where Merlin communicates nothing and Alice and Bob compute the function on their own, is exponentially better than any non-trivial protocol in terms of total communication. These OMA lower bounds also translate to the annotated streaming model, the MA analogue of single-pass data streaming. We show large separations between the classical streaming complexity and the non-trivial annotated streaming complexity (for the analogous notion in this setting) of fundamental problems such as counting distinct items, as well as of graph problems such as connectivity and k-connectivity in a certain edge update model called the support graph turnstile model that we introduce here. 
    more » « less