skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares
We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of X⊤X, where X is the design matrix. All prior private algorithms for this task require either d3/2 examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector.  more » « less
Award ID(s):
2238080
PAR ID:
10568954
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Conference on Learning Theory. The 37th Annual Conference on Learning Theory (COLT 2024)
Date Published:
Format(s):
Medium: X
Location:
Edmonton, Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. Krause, Andreas (Ed.)
    The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning. Existing theoretical analysis shows that PATE consistently learns any VC-classes in the realizable setting, but falls short in explaining its success in more general cases where the error rate of the optimal classifier is bounded away from zero. We fill in this gap by introducing the Tsybakov Noise Condition (TNC) and establish stronger and more interpretable learning bounds. These bounds provide new insights into when PATE works and improve over existing results even in the narrower realizable setting. We also investigate the compelling idea of using active learning for saving privacy budget, and empirical studies show the effectiveness of this new idea. The novel components in the proofs include a more refined analysis of the majority voting classifier — which could be of independent interest — and an observation that the synthetic “student” learning problem is nearly realizable by construction under the Tsybakov noise condition. 
    more » « less
  2. Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $$[\x_1, \x_2, \ldots, \x_n] = \X \in \R^{d \times n}$$ and a vector $$\y \in \R^d$$, the goal is to find coefficients $$\w \in \R^n$$ so that $$\X \w \approx \y$$, subject to some desired structure on $$\w$$. In this work we seek $$\w$$ that forms a local reconstruction of $$\y$$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $$\X$$ that are close to $$\y$$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $$\X$$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $$d \ll n$$. Under the same condition we also show that for any $$\y$$ contained in the convex hull of $$\X$$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $$\y$$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $$\X$$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex. 
    more » « less
  3. The ascending chain condition (ACC) conjecture for local volumes predicts that the set of local volumes of Kawamata log terminal (klt) singularities x ∈ ( X , Δ ) x\in (X,\Delta ) satisfies the ACC if the coefficients of Δ \Delta belong to a descending chain condition (DCC) set. In this paper, we prove the ACC conjecture for local volumes under the assumption that the ambient germ is analytically bounded. We introduce another related conjecture, which predicts the existence of δ \delta -plt blow-ups of a klt singularity whose local volume has a positive lower bound. We show that the latter conjecture also holds when the ambient germ is analytically bounded. Moreover, we prove that both conjectures hold in dimension 2 as well as for 3-dimensional terminal singularities. 
    more » « less
  4. We study the problem of efficiently estimating the effect of an intervention on a single variable using observational samples. Our goal is to give algorithms with polynomial time and sample complexity in a non-parametric setting. Tian and Pearl (AAAI ’02) have exactly characterized the class of causal graphs for which causal effects of atomic interventions can be identified from observational data. We make their result quantitative. Suppose 𝒫 is a causal model on a set V of n observable variables with respect to a given causal graph G, and let do(x) be an identifiable intervention on a variable X. We show that assuming that G has bounded in-degree and bounded c-components (k) and that the observational distribution satisfies a strong positivity condition: (i) [Evaluation] There is an algorithm that outputs with probability 2/3 an evaluator for a distribution P^ that satisfies TV(P(V | do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The evaluator can return in O(n) time the probability P^(v) for any assignment v to V. (ii) [Sampling] There is an algorithm that outputs with probability 2/3 a sampler for a distribution P^ that satisfies TV(P(V | do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The sampler returns an iid sample from P^ with probability 1 in O(n) time. We extend our techniques to estimate P(Y | do(x)) for a subset Y of variables of interest. We also show lower bounds for the sample complexity, demonstrating that our sample complexity has optimal dependence on the parameters n and eps, as well as if k=1 on the strong positivity parameter. 
    more » « less
  5. Abstract Given a Banach space X and a real number α ≥ 1, we write: (1) D ( X ) ≤ α if, for any locally finite metric space A , all finite subsets of which admit bilipschitz embeddings into X with distortions ≤ C , the space A itself admits a bilipschitz embedding into X with distortion ≤ α ⋅ C ; (2) D ( X ) = α + if, for every ϵ > 0, the condition D ( X ) ≤ α + ϵ holds, while D ( X ) ≤ α does not; (3) D ( X ) ≤ α + if D ( X ) = α + or D ( X ) ≤ α. It is known that D ( X ) is bounded by a universal constant, but the available estimates for this constant are rather large. The following results have been proved in this work: (1) D ((⊕ n =1 ∞ X n ) p ) ≤ 1 + for every nested family of finite-dimensional Banach spaces { X n } n =1 ∞ and every 1 ≤ p ≤ ∞. (2) D ((⊕ n =1 ∞ ℓ ∞ n ) p ) = 1 + for 1 < p < ∞. (3) D ( X ) ≤ 4 + for every Banach space X with no nontrivial cotype. Statement (3) is a strengthening of the Baudier–Lancien result (2008). 
    more » « less