skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Optimally hiding object sizes with constrained padding
Among the most challenging traffic-analysis attacks to confound are those leveraging the sizes of objects downloaded over the network. In this paper we systematically analyze this problem under realistic constraints regarding the padding overhead that the object store is willing to incur. We give algorithms to compute privacy-optimal padding schemes—specifically that minimize the network observer’s information gain from a downloaded object’s padded size—in several scenarios of interest: per-object padding, in which the object store responds to each request for an object with the same padded copy; per-request padding, in which the object store pads an object anew each time it serves that object; and a scenario unlike the previous ones in that the object store is unable to leverage a known distribution over the object queries. We provide constructions for privacy-optimal padding in each case, compare them to recent contenders in the research literature, and evaluate their performance on practical datasets.  more » « less
Award ID(s):
2207214
PAR ID:
10438806
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE Computer Security Foundations Symposium
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The sizes of objects retrieved over the network are powerful indicators of the objects retrieved and are ingredients in numerous types of traffic analysis, such as webpage fingerprinting. We present an algorithm by which a benevolent object store computes a memoryless padding scheme to pad objects before sending them, in a way that bounds the information gain that the padded sizes provide to the network observer about the objects being retrieved. Moreover, our algorithm innovates over previous works in two critical ways. First, the computed padding scheme satisfies constraints on the padding overhead: no object is padded to more than c× its original size, for a tunable factor c > 1. Second, the privacy guarantees of the padding scheme allow for object retrievals that are not independent, as could be caused by hyperlinking. We show in empirical tests that our padding schemes improve dramatically over previous schemes for padding dependent object retrievals, providing better privacy at substantially lower padding overhead, and over known techniques for padding independent object retrievals subject to padding overhead constraints. 
    more » « less
  2. Among the most challenging traffic-analysis attacks to confound are those leveraging the sizes of objects downloaded over the network, as the size of an object is often a powerful indicator of its identity. In this dissertation, we consider this challenge in both (i) the simplified setting where successive object retrievals are assumed to be independent and (ii) the setting where sequential object retrievals are dependent on one another. Furthermore, within the dependent retrievals setting, we address the scenario where enumerating all possible sequences is impractical. For each setting, we present algorithms by which a benevolent object store computes a memoryless padding scheme to pad objects before sending them, in a way that bounds the information gain that the padded sizes provide to the network observer about the objects being retrieved. Furthermore, all of our algorithms ensure that no object is padded to more than c× its original size, for a tunable factor c > 1. We compare each algorithm to recent contenders in the research literature and evaluate their performance on practical datasets. 
    more » « less
  3. A $(β,δ,Δ)$-padded decomposition of an edge-weighted graph $G = (V,E,w)$ is a stochastic decomposition into clusters of diameter at most $$Δ$$ such that for every vertex $$v\in V$$, the probability that $$\rm{ball}_G(v,γΔ)$$ is entirely contained in the cluster containing $$v$$ is at least $$e^{-βγ}$$ for every $$γ\in [0,δ]$$. Padded decompositions have been studied for decades and have found numerous applications, including metric embedding, multicommodity flow-cut gap, multicut, and zero extension problems, to name a few. In these applications, parameter $$β$$, called the padding parameter, is the most important parameter since it decides either the distortion or the approximation ratios. For general graphs with $$n$$ vertices, $$β= Θ(\log n)$$. Klein, Plotkin, and Rao showed that $$K_r$$-minor-free graphs have padding parameter $β= O(r^3)$, which is a significant improvement over general graphs when $$r$$ is a constant. A long-standing conjecture is to construct a padded decomposition for $$K_r$$-minor-free graphs with padding parameter $$β= O(\log r)$$. Despite decades of research, the best-known result is $β= O(r)$, even for graphs with treewidth at most $$r$$. In this work, we make significant progress toward the aforementioned conjecture by showing that graphs with treewidth $$\rm{tw}$$ admit a padded decomposition with padding parameter $$O(\log \rm{tw})$$, which is tight. As corollaries, we obtain an exponential improvement in dependency on treewidth in a host of algorithmic applications: $$O(\sqrt{ \log n \cdot \log(\rm{tw})})$$ flow-cut gap, max flow-min multicut ratio of $$O(\log(\rm{tw}))$$, an $$O(\log(\rm{tw}))$$ approximation for the 0-extension problem, an $$\ell^{O(\log n)}_\infty$$ embedding with distortion $$O(\log \rm{tw})$$, and an $$O(\log \rm{tw})$$ bound for integrality gap for the uniform sparsest cut. 39 pages. This is the TheoretiCS journal version 
    more » « less
  4. Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there more efficient alternatives to expand a transformer's expressive power without adding parameters? We consider transformers with padding tokens as a form of parallelizable test-time compute. We show that averaging-hard-attention, masked-pre-norm transformers with polynomial padding recognize precisely the class FO -uniform TC of extremely parallelizable problems. While the TC upper bound was known, proving a matching lower bound had been elusive. Further, our novel analysis reveals the precise expanded power of padded transformers when coupled with another form of inference-time compute, namely dynamically increasing depth via looping. Our core technical contribution is to show how padding helps bring the notions of complete problems and reductions, which have been a cornerstone of classical complexity theory, to the formal study of transformers. Armed with this new tool, we prove that padded transformers with looping on inputs of length recognize exactly the class FO -uniform TC of moderately parallelizable problems. Thus, padding and looping together systematically expand transformers' expressive power: with polylogarithmic looping, polynomially padded transformers recognize precisely the class FO -uniform NC , the best that could be expected without losing parallelism (unless NCP ). Our results thus motivate further exploration of padding and looping as parallelizable alternatives to chain of thought for test-time compute. 
    more » « less
  5. The notion of Local Differential Privacy (LDP) enables users to respond to sensitive questions while preserving their privacy. The basic LDP frequent oracle (FO) protocol enables an aggregator to estimate the frequency of any value. But when each user has a set of values, one needs an additional padding and sampling step to find the frequent values and estimate their frequencies. In this paper, we formally define such padding and sample based frequency oracles (PSFO). We further identify the privacy amplification property in PSFO. As a result, we propose SVIM, a protocol for finding frequent items in the set-valued LDP setting. Experiments show that under the same privacy guarantee and computational cost, SVIM significantly improves over existing methods. With SVIM to find frequent items, we propose SVSM to effectively find frequent itemsets, which to our knowledge has not been done before in the LDP setting. 
    more » « less