skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Space- and computationally-efficient set reconciliation via parity bitmap sketch (PBS)
Set reconciliation is a fundamental algorithmic problem that arises in many networking, system, and database applications. In this problem, two large sets A and B of objects (bitcoins, files, records, etc.) are stored respectively at two different network-connected hosts, which we name Alice and Bob respectively. Alice and Bob communicate with each other to learn A Δ B , the difference between A and B , and as a result the reconciled set A ∪ B. Current set reconciliation schemes are based on either invertible Bloom filters (IBF) or error-correction codes (ECC). The former has a low computational complexity of O(d) , where d is the cardinality of A Δ B , but has a high communication overhead that is several times larger than the theoretical minimum. The latter has a low communication overhead close to the theoretical minimum, but has a much higher computational complexity of O(d 2 ). In this work, we propose Parity Bitmap Sketch (PBS), an ECC-based set reconciliation scheme that gets the better of both worlds: PBS has both a low computational complexity of O(d) just like IBF-based solutions and a low communication overhead of roughly twice the theoretical minimum. A separate contribution of this work is a novel rigorous analytical framework that can be used for the precise calculation of various performance metrics and for the near-optimal parameter tuning of PBS.  more » « less
Award ID(s):
2007006 1909048
PAR ID:
10296507
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
14
Issue:
4
ISSN:
2150-8097
Page Range / eLocation ID:
458 to 470
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players, Bob, his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result each time before the next piece is revealed to Bob. This model has a closer and more natural correspondence to dynamic data structures than classic communication models do, and hence presents a new perspective on data structures. We first present a tight lower bound for the online set intersection problem in the online communication model, demonstrating a general approach for proving online communication lower bounds. The online communication model prevents a batching trick that classic communication complexity allows, and yields a stronger lower bound. We then apply the online communication model to prove data structure lower bounds for two dynamic data structure problems: the Group Range problem and the Dynamic Connectivity problem for forests. Both of the problems admit a worst case O(logn)-time data structure. Using online communication complexity, we prove a tight cell-probe lower bound for each: spending o(logn) (even amortized) time per operation results in at best an exp(−δ2 n) probability of correctly answering a (1/2+δ)-fraction of the n queries. 
    more » « less
  2. We study the role of interaction in the Common Randomness Generation (CRG) and Secret Key Generation (SKG) problems. In the CRG problem, two players, Alice and Bob, respectively get samples X1, X2, . . . and Y1, Y2, . . . with the pairs (X1, Y1), (X2, Y2), . . . being drawn independently from some known probability distribution μ. They wish to communicate so as to agree on L bits of randomness. The SKG problem is the restriction of the CRG problem to the case where the key is required to be close to random even to an eavesdropper who can listen to their communication (but does not have access to the inputs of Alice and Bob). In this work, we study the relationship between the amount of communication and the number of rounds of interaction in both the CRG and the SKG problems. Specifically, we construct a family of distributions μ = μr,n,L, parametrized by integers r, n and L, such that for every r there exists a constant b = b(r) for which CRG (respectively SKG) is feasible when (Xi, Yi) ~ μr,n,L with r + 1 rounds of communication, each consisting of O(log n) bits, but when restricted to r/2 − 2 rounds of interaction, the total communication must exceed Ω(n/ logb(n)) bits. Prior to our work no separations were known for r ≥ 2. 
    more » « less
  3. Modulation Classification (MC) is the problem of classifying the modulation format of a wireless signal. In the wireless communications pipeline, MC is the first operation performed on the received signal and is critical for reliable decoding. This paper considers the problem of secure MC, where a transmitter (Alice) wants to maximize MC accuracy at a legitimate receiver (Bob) while minimizing MC accuracy at an eavesdropper (Eve). This work introduces novel adversarial learning techniques for secure MC. We present adversarial filters in which Alice uses a carefully designed adversarial filter to mask the transmitted signal, that can maximize MC accuracy at Bob while minimizing MC accuracy at Eve. We present two filtering-based algorithms, namely gradient ascent filter (GAF), and a fast gradient filter method (FGFM), with varying levels of complexity. Our proposed adversarial filtering-based approaches significantly outperform additive adversarial perturbations (used in the traditional machine learning (ML) community and other prior works on secure MC) and have several other desirable properties. In particular, GAF and FGFM algorithms are a) computational efficient (allow fast decoding at Bob), b) power-efficient (do not require excessive transmit power at Alice); and c) SNR efficient (i.e., perform well even at low SNR values at Bob). 
    more » « less
  4. We consider a scenario wherein two parties Alice and Bob are provided X1 and X2 – samples that are IID from a PMF P_X1X2. Alice and Bob can communicate to Charles over (noiseless) communication links of rate R1 and R2 respectively. Their goal is to enable Charles generate samples Y such that the triple (X1,X2,Y) has a PMF that is close, in total variation, to P_X1X2Y. In addition, the three parties may posses shared common randomness at rate C. We address the problem of characterizing the set of rate triples (R1, R2, C) for which the above goal can be accomplished. We provide a set of sufficient conditions, i.e., an achievable rate region for this three party setup. Our work also provides a complete characterization of a point-to-point setup wherein Bob is absent and Charles is provided with side-information. 
    more » « less
  5. Tauman Kalai, Yael (Ed.)
    We introduce and study the communication complexity of computing the inner product of two vectors, where the input is restricted w.r.t. a norm N on the space ℝⁿ. Here, Alice and Bob hold two vectors v,u such that ‖v‖_N ≤ 1 and ‖u‖_{N^*} ≤ 1, where N^* is the dual norm. The goal is to compute their inner product ⟨v,u⟩ up to an ε additive term. The problem is denoted by IP_N, and generalizes important previously studied problems, such as: (1) Computing the expectation 𝔼_{x∼𝒟}[f(x)] when Alice holds 𝒟 and Bob holds f is equivalent to IP_{𝓁₁}. (2) Computing v^TAv where Alice has a symmetric matrix with bounded operator norm (denoted S_∞) and Bob has a vector v where ‖v‖₂ = 1. This problem is complete for quantum communication complexity and is equivalent to IP_{S_∞}. We systematically study IP_N, showing the following results, near tight in most cases: 1) For any symmetric norm N, given ‖v‖_N ≤ 1 and ‖u‖_{N^*} ≤ 1 there is a randomized protocol using 𝒪̃(ε^{-6} log n) bits of communication that returns a value in ⟨u,v⟩±ε with probability 2/3 - we will denote this by ℛ_{ε,1/3}(IP_N) ≤ 𝒪̃(ε^{-6} log n). In a special case where N = 𝓁_p and N^* = 𝓁_q for p^{-1} + q^{-1} = 1, we obtain an improved bound ℛ_{ε,1/3}(IP_{𝓁_p}) ≤ 𝒪(ε^{-2} log n), nearly matching the lower bound ℛ_{ε, 1/3}(IP_{𝓁_p}) ≥ Ω(min(n, ε^{-2})). 2) One way communication complexity ℛ^{→}_{ε,δ}(IP_{𝓁_p}) ≤ 𝒪(ε^{-max(2,p)}⋅ log n/ε), and a nearly matching lower bound ℛ^{→}_{ε, 1/3}(IP_{𝓁_p}) ≥ Ω(ε^{-max(2,p)}) for ε^{-max(2,p)} ≪ n. 3) One way communication complexity ℛ^{→}_{ε,δ}(N) for a symmetric norm N is governed by the distortion of the embedding 𝓁_∞^k into N. Specifically, while a small distortion embedding easily implies a lower bound Ω(k), we show that, conversely, non-existence of such an embedding implies protocol with communication k^𝒪(log log k) log² n. 4) For arbitrary origin symmetric convex polytope P, we show ℛ_{ε,1/3}(IP_{N}) ≤ 𝒪(ε^{-2} log xc(P)), where N is the unique norm for which P is a unit ball, and xc(P) is the extension complexity of P (i.e. the smallest number of inequalities describing some polytope P' s.t. P is projection of P'). 
    more » « less