skip to main content


Title: Privately Answering Counting Queries with Generalized Gaussian Mechanisms
We give the first closed-form privacy guarantees for the Generalized Gaussian mechanism (the mechanism that adds noise x to a vector with probability proportional to exp(-(||x||_p/σ)^p) for some σ, p), in the setting of answering k counting (i.e. sensitivity-1) queries about a database with (ε, δ)-differential privacy (in particular, with low 𝓁_∞-error). Just using Generalized Gaussian noise, we obtain a mechanism such that if the true answers to the queries are the vector d, the mechanism outputs answers d̃ with the 𝓁_∞-error guarantee: 𝔼[||d̃ - d||_∞] = O(√{k log log k log(1/δ)}/ε). This matches the error bound of [Steinke and Ullman, 2017], but using a much simpler mechanism. By composing this mechanism with the sparse vector mechanism (generalizing a technique of [Steinke and Ullman, 2017]), we obtain a mechanism improving the √{k log log k} dependence on k to √{k log log log k}, Our main technical contribution is showing that certain powers of Generalized Gaussians, which follow a Generalized Gamma distribution, are sub-gamma. In subsequent work, the optimal 𝓁_∞-error bound of O(√{k log (1/δ)}/ε) has been achieved by [Yuval Dagan and Gil Kur, 2020] and [Badih Ghazi et al., 2020] independently. However, the Generalized Gaussian mechanism has some qualitative advantages over the mechanisms used in these papers which may make it of interest to both practitioners and theoreticians, both in the setting of answering counting queries and more generally.  more » « less
Award ID(s):
1816861
NSF-PAR ID:
10253485
Author(s) / Creator(s):
;
Editor(s):
Ligett, Katrina; Gupta, Swati
Date Published:
Journal Name:
Leibniz international proceedings in informatics
Volume:
192
ISSN:
1868-8969
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given n samples from a (sub-)Gaussian distribution with unknown mean μ and covariance Σ, our (ε,δ)-differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αε+dlog1/δε. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/ε), where ω<2.38 is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently. 
    more » « less
  2. Perhaps the single most important use case for differential privacy is to privately answer numerical queries, which is usually achieved by adding noise to the answer vector. The central question is, therefore, to understand which noise distribution optimizes the privacy-accuracy trade-off, especially when the dimension of the answer vector is high. Accordingly, an extensive literature has been dedicated to the question and the upper and lower bounds have been successfully matched up to constant factors (Bun et al.,2018; Steinke & Ullman, 2017). In this paper, we take a novel approach to address this important optimality question. We first demonstrate an intriguing central limit theorem phenomenon in the high-dimensional regime. More precisely, we prove that a mechanism is approximately Gaussian Differentially Private (Dong et al., 2021) if the added noise satisfies certain conditions. In particular, densities proportional to exp(-||x||_p^alpha), where ||x||_p is the standard l_p-norm, satisfies the conditions. Taking this perspective, we make use of the Cramer--Rao inequality and show an "uncertainty principle"-style result: the product of privacy parameter and the l_2-loss of the mechanism is lower bounded by the dimension. Furthermore, the Gaussian mechanism achieves the constant-sharp optimal privacy-accuracy trade-off among all such noises. Our findings are corroborated by numerical experiments. 
    more » « less
  3. Given a metric space ℳ = (X,δ), a weighted graph G over X is a metric t-spanner of ℳ if for every u,v ∈ X, δ(u,v) ≤ δ_G(u,v) ≤ t⋅ δ(u,v), where δ_G is the shortest path metric in G. In this paper, we construct spanners for finite sets in metric spaces in the online setting. Here, we are given a sequence of points (s₁, …, s_n), where the points are presented one at a time (i.e., after i steps, we have seen S_i = {s₁, … , s_i}). The algorithm is allowed to add edges to the spanner when a new point arrives, however, it is not allowed to remove any edge from the spanner. The goal is to maintain a t-spanner G_i for S_i for all i, while minimizing the number of edges, and their total weight. Under the L₂-norm in ℝ^d for arbitrary constant d ∈ ℕ, we present an online (1+ε)-spanner algorithm with competitive ratio O_d(ε^{-d} log n), improving the previous bound of O_d(ε^{-(d+1)}log n). Moreover, the spanner maintained by the algorithm has O_d(ε^{1-d}log ε^{-1})⋅ n edges, almost matching the (offline) optimal bound of O_d(ε^{1-d})⋅ n. In the plane, a tighter analysis of the same algorithm provides an almost quadratic improvement of the competitive ratio to O(ε^{-3/2}logε^{-1}log n), by comparing the online spanner with an instance-optimal spanner directly, bypassing the comparison to an MST (i.e., lightness). As a counterpart, we design a sequence of points that yields a Ω_d(ε^{-d}) lower bound for the competitive ratio for online (1+ε)-spanner algorithms in ℝ^d under the L₁-norm. Then we turn our attention to online spanners in general metrics. Note that, it is not possible to obtain a spanner with stretch less than 3 with a subquadratic number of edges, even in the offline setting, for general metrics. We analyze an online version of the celebrated greedy spanner algorithm, dubbed ordered greedy. With stretch factor t = (2k-1)(1+ε) for k ≥ 2 and ε ∈ (0,1), we show that it maintains a spanner with O(ε^{-1}logε^{-1})⋅ n^{1+1/k} edges and O(ε^{-1}n^{1/k}log² n) lightness for a sequence of n points in a metric space. We show that these bounds cannot be significantly improved, by introducing an instance that achieves an Ω(1/k⋅ n^{1/k}) competitive ratio on both sparsity and lightness. Furthermore, we establish the trade-off among stretch, number of edges and lightness for points in ultrametrics, showing that one can maintain a (2+ε)-spanner for ultrametrics with O(ε^{-1}logε^{-1})⋅ n edges and O(ε^{-2}) lightness. 
    more » « less
  4. Given a metric space ℳ = (X,δ), a weighted graph G over X is a metric t-spanner of ℳ if for every u,v ∈ X, δ(u,v) ≤ δ_G(u,v) ≤ t⋅ δ(u,v), where δ_G is the shortest path metric in G. In this paper, we construct spanners for finite sets in metric spaces in the online setting. Here, we are given a sequence of points (s₁, …, s_n), where the points are presented one at a time (i.e., after i steps, we have seen S_i = {s₁, … , s_i}). The algorithm is allowed to add edges to the spanner when a new point arrives, however, it is not allowed to remove any edge from the spanner. The goal is to maintain a t-spanner G_i for S_i for all i, while minimizing the number of edges, and their total weight. Under the L₂-norm in ℝ^d for arbitrary constant d ∈ ℕ, we present an online (1+ε)-spanner algorithm with competitive ratio O_d(ε^{-d} log n), improving the previous bound of O_d(ε^{-(d+1)}log n). Moreover, the spanner maintained by the algorithm has O_d(ε^{1-d}log ε^{-1})⋅ n edges, almost matching the (offline) optimal bound of O_d(ε^{1-d})⋅ n. In the plane, a tighter analysis of the same algorithm provides an almost quadratic improvement of the competitive ratio to O(ε^{-3/2}logε^{-1}log n), by comparing the online spanner with an instance-optimal spanner directly, bypassing the comparison to an MST (i.e., lightness). As a counterpart, we design a sequence of points that yields a Ω_d(ε^{-d}) lower bound for the competitive ratio for online (1+ε)-spanner algorithms in ℝ^d under the L₁-norm. Then we turn our attention to online spanners in general metrics. Note that, it is not possible to obtain a spanner with stretch less than 3 with a subquadratic number of edges, even in the offline setting, for general metrics. We analyze an online version of the celebrated greedy spanner algorithm, dubbed ordered greedy. With stretch factor t = (2k-1)(1+ε) for k ≥ 2 and ε ∈ (0,1), we show that it maintains a spanner with O(ε^{-1}logε^{-1})⋅ n^{1+1/k} edges and O(ε^{-1}n^{1/k}log² n) lightness for a sequence of n points in a metric space. We show that these bounds cannot be significantly improved, by introducing an instance that achieves an Ω(1/k⋅ n^{1/k}) competitive ratio on both sparsity and lightness. Furthermore, we establish the trade-off among stretch, number of edges and lightness for points in ultrametrics, showing that one can maintain a (2+ε)-spanner for ultrametrics with O(ε^{-1}logε^{-1})⋅ n edges and O(ε^{-2}) lightness. 
    more » « less
  5. We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given n samples from a (sub-)Gaussian distribution with unknown mean μ and covariance Σ, our (ϵ,δ)-differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αϵ+dlog1/δϵ. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/\eps), where ω<2.38 is the matrix multiplication exponent.We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above.Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance.Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently. 
    more » « less