Perhaps the single most important use case for differential privacy is to privately answer numerical queries, which is usually achieved by adding noise to the answer vector. The central question is, therefore, to understand which noise distribution optimizes the privacyaccuracy tradeoff, especially when the dimension of the answer vector is high. Accordingly, an extensive literature has been dedicated to the question and the upper and lower bounds have been successfully matched up to constant factors (Bun et al.,2018; Steinke & Ullman, 2017). In this paper, we take a novel approach to address this important optimality question. We first demonstrate an intriguing central limit theorem phenomenon in the highdimensional regime. More precisely, we prove that a mechanism is approximately Gaussian Differentially Private (Dong et al., 2021) if the added noise satisfies certain conditions. In particular, densities proportional to exp(x_p^alpha), where x_p is the standard l_pnorm, satisfies the conditions. Taking this perspective, we make use of the CramerRao inequality and show an "uncertainty principle"style result: the product of privacy parameter and the l_2loss of the mechanism is lower bounded by the dimension. Furthermore, the Gaussian mechanism achieves the constantsharp optimal privacyaccuracy tradeoff among all such noises. Our findings are corroborated by numerical experiments.
more »
« less
Privately Answering Counting Queries with Generalized Gaussian Mechanisms
We give the first closedform privacy guarantees for the Generalized Gaussian mechanism (the mechanism that adds noise x to a vector with probability proportional to exp((x_p/σ)^p) for some σ, p), in the setting of answering k counting (i.e. sensitivity1) queries about a database with (ε, δ)differential privacy (in particular, with low 𝓁_∞error). Just using Generalized Gaussian noise, we obtain a mechanism such that if the true answers to the queries are the vector d, the mechanism outputs answers d̃ with the 𝓁_∞error guarantee: 𝔼[d̃  d_∞] = O(√{k log log k log(1/δ)}/ε). This matches the error bound of [Steinke and Ullman, 2017], but using a much simpler mechanism. By composing this mechanism with the sparse vector mechanism (generalizing a technique of [Steinke and Ullman, 2017]), we obtain a mechanism improving the √{k log log k} dependence on k to √{k log log log k}, Our main technical contribution is showing that certain powers of Generalized Gaussians, which follow a Generalized Gamma distribution, are subgamma. In subsequent work, the optimal 𝓁_∞error bound of O(√{k log (1/δ)}/ε) has been achieved by [Yuval Dagan and Gil Kur, 2020] and [Badih Ghazi et al., 2020] independently. However, the Generalized Gaussian mechanism has some qualitative advantages over the mechanisms used in these papers which may make it of interest to both practitioners and theoreticians, both in the setting of answering counting queries and more generally.
more »
« less
 Award ID(s):
 1816861
 NSFPAR ID:
 10253485
 Editor(s):
 Ligett, Katrina; Gupta, Swati
 Date Published:
 Journal Name:
 Leibniz international proceedings in informatics
 Volume:
 192
 ISSN:
 18688969
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)In practice, differentially private data releases are designed to support a variety of applications. A data release is fit for use if it meets target accuracy requirements for each application. In this paper, we consider the problem of answering linear queries under differential privacy subject to perquery accuracy constraints. Existing practical frameworks like the matrix mechanism do not provide such finegrained control (they optimize total error, which allows some query answers to be more accurate than necessary, at the expense of other queries that become no longer useful). Thus, we design a fitnessforuse strategy that adds privacypreserving Gaussian noise to query answers. The covariance structure of the noise is optimized to meet the finegrained accuracy requirements while minimizing the cost to privacy.more » « less

We present a fast, differentially private algorithm for highdimensional covarianceaware mean estimation with nearly optimal sample complexity. Only exponentialtime estimators were previously known to achieve this guarantee. Given n samples from a (sub)Gaussian distribution with unknown mean μ and covariance Σ, our (ε,δ)differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αε+dlog1/δε. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/ε), where ω<2.38 is the matrix multiplication exponent. We adapt an exponentialtime approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.more » « less

Bringmann, Karl ; Grohe, Martin ; Puppis, Gabriele ; Svensson, Ola (Ed.)We revisit the noisy binary search model of [Karp and Kleinberg, 2007], in which we have n coins with unknown probabilities p_i that we can flip. The coins are sorted by increasing p_i, and we would like to find where the probability crosses (to within ε) of a target value τ. This generalized the fixednoise model of [Burnashev and Zigangirov, 1974], in which p_i = 1/2 ± ε, to a setting where coins near the target may be indistinguishable from it. It was shown in [Karp and Kleinberg, 2007] that Θ(1/ε² log n) samples are necessary and sufficient for this task. We produce a practical algorithm by solving two theoretical challenges: highprobability behavior and sharp constants. We give an algorithm that succeeds with probability 1δ from 1/C_{τ, ε} ⋅ (log₂ n + O(log^{2/3} n log^{1/3} 1/(δ) + log 1/(δ))) samples, where C_{τ, ε} is the optimal such constant achievable. For δ > n^{o(1)} this is within 1 + o(1) of optimal, and for δ ≪ 1 it is the first bound within constant factors of optimal.more » « less

We present a fast, differentially private algorithm for highdimensional covarianceaware mean estimation with nearly optimal sample complexity. Only exponentialtime estimators were previously known to achieve this guarantee. Given n samples from a (sub)Gaussian distribution with unknown mean μ and covariance Σ, our (ϵ,δ)differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αϵ+dlog1/δϵ. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/\eps), where ω<2.38 is the matrix multiplication exponent.We adapt an exponentialtime approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above.Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance.Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.more » « less