Abstract In the past decade, differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analysing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation of differential privacy, which we term ‘f-differential privacy’ (f-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f-DP faithfully preserves the hypothesis testing interpretation of differential privacy, thereby making the privacy guarantees easily interpretable. In addition, f-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for the original differential privacy definition to f-DP and, as an application of this technique, obtain a simple and easy-to-interpret theorem of privacy amplification by subsampling for f-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f-DP class that is referred to as ‘Gaussian differential privacy’ (GDP), defined based on hypothesis testing of two shifted Gaussian distributions. GDP is the focal privacy definition among the family of f-DP guarantees due to a central limit theorem for differential privacy that we prove. More precisely, the privacy guarantees of any hypothesis testing based definition of privacy (including the original differential privacy definition) converges to GDP in the limit under composition. We also prove a Berry–Esseen style version of the central limit theorem, which gives a computationally inexpensive tool for tractably analysing the exact composition of private algorithms. Taken together, this collection of attractive properties render f-DP a mathematically coherent, analytically tractable and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent.
more »
« less
Deciding accuracy of differential privacy schemes
Differential privacy is a mathematical framework for developing statistical computations with provable guarantees of privacy and accuracy. In contrast to the privacy component of differential privacy, which has a clear mathematical and intuitive meaning, the accuracy component of differential privacy does not have a generally accepted definition; accuracy claims of differential privacy algorithms vary from algorithm to algorithm and are not instantiations of a general definition. We identify program discontinuity as a common theme in existing ad hoc definitions and introduce an alternative notion of accuracy parametrized by, what we call, — the of an input x w.r.t. a deterministic computation f and a distance d , is the minimal distance d ( x , y ) over all y such that f ( y )≠ f ( x ). We show that our notion of accuracy subsumes the definition used in theoretical computer science, and captures known accuracy claims for differential privacy algorithms. In fact, our general notion of accuracy helps us prove better claims in some cases. Next, we study the decidability of accuracy. We first show that accuracy is in general undecidable. Then, we define a non-trivial class of probabilistic computations for which accuracy is decidable (unconditionally, or assuming Schanuel’s conjecture). We implement our decision procedure and experimentally evaluate the effectiveness of our approach for generating proofs or counterexamples of accuracy for common algorithms from the literature.
more »
« less
- PAR ID:
- 10257024
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 5
- Issue:
- POPL
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 1 to 30
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analyzing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation, which we term `f-differential privacy' (f-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f-DP preserves the hypothesis testing interpretation. In addition, f-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for original DP to f-DP and, as an application, obtain a simple subsampling theorem for f-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f-DP class that is referred to as `Gaussian differential privacy' (GDP), defined based on testing two shifted Gaussians. GDP is focal among the f-DP class because of a central limit theorem we prove. More precisely, the privacy guarantees of \emph{any} hypothesis testing based definition of privacy (including original DP) converges to GDP in the limit under composition. The CLT also yields a computationally inexpensive tool for analyzing the exact composition of private algorithms. Taken together, this collection of attractive properties render f-DP a mathematically coherent, analytically tractable, and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved privacy analysis of noisy stochastic gradient descent.more » « less
-
Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source.more » « less
-
Quantum computers can produce a quantum encoding of the solution of a system of differential equations exponentially faster than a classical algorithm can produce an explicit description. However, while high-precision quantum algorithms for linear ordinary differential equations are well established, the best previous quantum algorithms for linear partial differential equations (PDEs) have complexity p o l y ( 1 / ϵ ) , where ϵ is the error tolerance. By developing quantum algorithms based on adaptive-order finite difference methods and spectral methods, we improve the complexity of quantum algorithms for linear PDEs to be p o l y ( d , log ( 1 / ϵ ) ) , where d is the spatial dimension. Our algorithms apply high-precision quantum linear system algorithms to systems whose condition numbers and approximation errors we bound. We develop a finite difference algorithm for the Poisson equation and a spectral algorithm for more general second-order elliptic equations.more » « less
-
We study local filters for the Lipschitz property of real-valued functions f : V → [0,r], where the Lipschitz property is defined with respect to an arbitrary undirected graph G = (V, E ). We give nearly optimal local Lipschitz filters both with respect to ℓ1-distance and ℓ0-distance. Previous work only considered unbounded- range functions over [n]d. Jha and Raskhodnikova (SICOMP ‘13) gave an algorithm for such functions with lookup complexity exponential in d, which Awasthi et al. (ACM Trans. Comput. Theory) showed was necessary in this setting. We demonstrate that important applications of local Lipschitz filters can be accomplished with filters for functions whose range is bounded in [0,r]. For functions f : [n]d → [0,r], we achieve running time (dr log n )O (log r ) for the ℓ1-respecting filter and dO(r) polylog n for the ℓ0-respecting filter, thus circumventing the lower bound. Our local filters provide a novel Lipschitz extension that can be implemented locally. Furthermore, we show that our algorithms are nearly optimal in terms of the dependence on r for the domain {0,1}d, an important special case of the domain [n]d. In addition, our lower bound resolves an open question of Awasthi et al., removing one of the conditions necessary for their lower bound for general range. We prove our lower bound via a reduction from distribution-free Lipschitz testing and a new technique for proving hardness for adaptive algorithms. Finally, we provide two applications of our local filters to real-valued functions, with no restrictions on the range. In the first application, we use them in conjunction with the Laplace mechanism for differential privacy and noisy binary search to provide mechanisms for privately releasing outputs of black-box functions, even in the presence of malicious clients. In particular, our differentially private mechanism for arbitrary real-valued functions runs in time 2polylog min(r,nd ) and, for honest clients, has accuracy comparable to the Laplace mechanism for Lipschitz functions, up to a factor of O (log min(r,nd )). In the second application, we use our local filters to obtain the first nontrivial tolerant tester for the Lipschitz property. Our tester works for functions of the form f : {0,1}d → ℝ, makes queries, and has tolerance ratio 2.01. Our applications demonstrate that local filters for bounded-range functions can be applied to construct efficient algorithms for arbitrary real-valued functions.more » « less