This content will become publicly available on June 2, 2024
- NSF-PAR ID:
- 10444873
- Date Published:
- Journal Name:
- STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing
- Page Range / eLocation ID:
- 520 to 527
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Stefano Leonardi and Anupam Gupta (Ed.)A probabilistic algorithm A is pseudodeterministic if, on every input, there exists a canonical value that is output with high probability. If the algorithm outputs one of k canonical values with high probability, then it is called a k-pseudodeterministic algorithm. In the study of pseudodeterminism, the Acceptance Probability Estimation Problem (APEP), which is to additively approximate the acceptance probability of a Boolean circuit, is emerging as a central computational problem. This problem admits a 2-pseudodeterministic algorithm. Recently, it was shown that a pseudodeterministic algorithm for this problem would imply that any multi-valued function that admits a k-pseudodeterministic algorithm for a constant k (including approximation algorithms) also admits a pseudodeterministic algorithm (Dixon, Pavan, Vinodchandran; ITCS 2021). The contribution of the present work is two-fold. First, as our main conceptual contribution, we establish that the existence of a pseudodeterministic algorithm for APEP is fundamentally related to the gap between probabilistic promise classes and the corresponding standard complexity classes. In particular, we show the following equivalence: APEP has a pseudodeterministic approximation algorithm if and only if every promise problem in PromiseBPP has a solution in BPP. A conceptual interpretation of this equivalence is that the algorithmic gap between 2-pseudodeterminism and pseudodeterminism is equivalent to the gap between PromiseBPP and BPP. Based on this connection, we show that designing pseudodeterministic algorithms for APEP leads to the solution of some open problems in complexity theory, including new Boolean circuit lower bounds. This equivalence also explains how multi-pseudodeterminism is connected to problems in SearchBPP. In particular, we show that if APEP has a pseudodeterministic algorithm, then every problem that admits a k(n)-pseudodeterministic algorithm (for any polynomial k) is in SearchBPP and admits a pseudodeterministic algorithm. Motivated by this connection, we also explore its connection to probabilistic search problems and establish that APEP is complete for certain notions of search problems in the context of pseudodeterminism. Our second contribution is establishing query complexity lower bounds for multi-pseudodeterministic computations. We prove that for every k ≥ 1, there exists a problem whose (k+1)-pseudodeterministic query complexity, in the uniform query model, is O(1) but has a k-pseudodeterministic query complexity of Ω(n), even in the more general nonadaptive query model. A key contribution of this part of the work is the utilization of Sperner’s lemma in establishing query complexity lower bounds.more » « less
-
Abstract In the past decade, differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analysing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation of differential privacy, which we term ‘f-differential privacy’ (f-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f-DP faithfully preserves the hypothesis testing interpretation of differential privacy, thereby making the privacy guarantees easily interpretable. In addition, f-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for the original differential privacy definition to f-DP and, as an application of this technique, obtain a simple and easy-to-interpret theorem of privacy amplification by subsampling for f-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f-DP class that is referred to as ‘Gaussian differential privacy’ (GDP), defined based on hypothesis testing of two shifted Gaussian distributions. GDP is the focal privacy definition among the family of f-DP guarantees due to a central limit theorem for differential privacy that we prove. More precisely, the privacy guarantees of any hypothesis testing based definition of privacy (including the original differential privacy definition) converges to GDP in the limit under composition. We also prove a Berry–Esseen style version of the central limit theorem, which gives a computationally inexpensive tool for tractably analysing the exact composition of private algorithms. Taken together, this collection of attractive properties render f-DP a mathematically coherent, analytically tractable and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent.
-
Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analyzing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation, which we term `f-differential privacy' (f-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f-DP preserves the hypothesis testing interpretation. In addition, f-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for original DP to f-DP and, as an application, obtain a simple subsampling theorem for f-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f-DP class that is referred to as `Gaussian differential privacy' (GDP), defined based on testing two shifted Gaussians. GDP is focal among the f-DP class because of a central limit theorem we prove. More precisely, the privacy guarantees of \emph{any} hypothesis testing based definition of privacy (including original DP) converges to GDP in the limit under composition. The CLT also yields a computationally inexpensive tool for analyzing the exact composition of private algorithms. Taken together, this collection of attractive properties render f-DP a mathematically coherent, analytically tractable, and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved privacy analysis of noisy stochastic gradient descent.more » « less
-
Large corporations, government entities and institutions such as hospitals and census bureaus routinely collect our personal and sensitive information for providing services. A key technological challenge is designing algorithms for these services that provide useful results, while simultaneously maintaining the privacy of the individuals whose data are being shared. Differential privacy (DP) is a cryptographically motivated and mathematically rigorous approach for addressing this challenge. Under DP, a randomized algorithm provides privacy guarantees by approximating the desired functionality, leading to a privacy–utility trade-off. Strong (pure DP) privacy guarantees are often costly in terms of utility. Motivated by the need for a more efficient mechanism with better privacy–utility trade-off, we propose Gaussian FM, an improvement to the functional mechanism (FM) that offers higher utility at the expense of a weakened (approximate) DP guarantee. We analytically show that the proposed Gaussian FM algorithm can offer orders of magnitude smaller noise compared to the existing FM algorithms. We further extend our Gaussian FM algorithm to decentralized-data settings by incorporating the CAPE protocol and propose capeFM. Our method can offer the same level of utility as its centralized counterparts for a range of parameter choices. We empirically show that our proposed algorithms outperform existing state-of-the-art approaches on synthetic and real datasets.
-
Several well-studied models of access to data samples, including statistical queries, local differential privacy and low-communication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of $\E_{x\sim D}[q(x)]$ for any choice of the query function $q:X\rightarrow \R$, where $D$ is an unknown data distribution.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using $k$-wise queries each of which is specified by a function $q:X^k\rightarrow \R$. Hence it is natural to ask whether algorithms using $k$-wise queries can solve learning problems more efficiently and by how much. Blum, Kalai, Wasserman~\cite{blum2003noise} showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with $k$-wise SQs is smaller than the (unary) SQ complexity by a factor of at most $2^k$. We show that for more general problems over distributions the picture is substantially richer. For every $k$, the complexity of distribution-independent PAC learning with $k$-wise queries can be exponentially larger than learning with $(k+1)$-wise queries. We then give two approaches for simulating a $k$-wise query using unary queries. The first approach exploits the structure of the problem that needs to be solved. It generalizes and strengthens (exponentially) the results of Blum \etal \cite{blum2003noise}. It allows us to derive strong lower bounds for learning DNF formulas and stochastic constraint satisfaction problems that hold against algorithms using $k$-wise queries. The second approach exploits the $k$-party communication complexity of the $k$-wise query function.more » « less