We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of X⊤X, where X is the design matrix. All prior private algorithms for this task require either d3/2 examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector.
more »
« less
Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning
This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $$\hat\beta(y,X)$$ where differentiation is taken with respect to both X and y; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residuals in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of the $$\hat\beta$$ and the effective degrees of freedom of the M-estimators, which are of independent interest.
more »
« less
- PAR ID:
- 10357819
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Volume:
- 178
- ISSN:
- 2640-3498
- Page Range / eLocation ID:
- 1912--1947
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Location estimation is one of the most basic questions in parametric statistics. Suppose we have a known distribution density f , and we get n i.i.d. samples from f (x − μ) for some unknown shift μ. The task is to estimate μ to high accuracy with high probability. The maximum likelihood estimator (MLE) is known to be asymptotically optimal as n → ∞, but what is possible for finite n? In this paper, we give two location estimators that are optimal under different criteria: 1) an estimator that has minimax-optimal estimation error subject to succeeding with probability 1 − ¶ and 2) a confidence interval estimator which, subject to its output interval containing μ with probability at least 1 − ¶, has the minimum expected squared interval width among all shift-invariant estimators. The latter construction can be generalized to minimizing the expectation of any loss function on the interval width.more » « less
-
Location estimation is one of the most basic questions in parametric statistics. Suppose we have a known distribution density f, and we get n i.i.d. samples from f(x − µ) for some unknown shift µ. The task is to estimate µ to high accuracy with high probability. The maximum likelihood estimator (MLE) is known to be asymptotically optimal as n → ∞, but what is possible for finite n? In this paper, we give two location estimators that are optimal under different criteria: 1) an estimator that has minimax-optimal estimation error subject to succeeding with probability 1 − δ and 2) a confidence interval estimator which, subject to its output interval containing µ with probability at least 1 − δ, has the minimum expected squared interval width among all shift-invariant estimators. The latter construction can be generalized to minimizing the expectation of any loss function on the interval width.more » « less
-
null (Ed.)Abstract Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from $$d$$-dimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth and median estimators and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near optimal when data are i.i.d. and when the fraction of ‘low-noise’ distributions is as small as $$\varOmega \left (\frac{d \log n}{n}\right )$$, where $$n$$ is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.more » « less
-
This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty involves the norm of the gradient. We show consistency of our estimator and a rate of convergence of order O˜(n−1/d), matching the bounds of alternative estimators based on square-loss minimization. Contrary to all of the existing results, our convergence rates hold without imposing compactness on the underlying domain and with no a priori bounds on the underlying convex function or its gradient norm.more » « less