Experimental design is a classical area in statistics and has also found new applications in machine learning. In the combinatorial experimental design problem, the aim is to estimate an unknown mdimensional vector x from linear measurements where a Gaussian noise is introduced in each measurement. The goal is to pick k out of the given n experiments so as to make the most accurate estimate of the unknown parameter x. Given a set S of chosen experiments, the most likelihood estimate x0 can be obtained by a least squares computation. One of the robust measures of error estimation is the Doptimality criterion which aims to minimize the generalized variance of the estimator. This corresponds to minimizing the volume of the standard confidence ellipsoid for the estimation error x − x0. The problem gives rise to two natural variants depending on whether repetitions of experiments is allowed or not. The latter variant, while being more general, has also found applications in geographical location of sensors. We show a close connection between approximation algorithms for the Doptimal design problem and constructions of approximately mwise positively correlated distributions. This connection allows us to obtain first approximation algorithms for the Doptimal design problem withmore »
This content will become publicly available on July 1, 2023
Derivatives and residual distribution of regularized Mestimators with application to adaptive tuning
This paper studies Mestimators with gradientLipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust Mestimator constructed with the Huber loss and the ElasticNet penalty and the noise distribution has heavytails. Our main contributions are threefold. (i) We provide general formulae for the derivatives of regularized Mestimators $\hat\beta(y,X)$ where differentiation is taken with respect to both X and y; this reveals a simple differentiability structure shared by all convex regularized Mestimators. (ii) Using these derivatives, we characterize the distribution of the residuals in the intermediate highdimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized Mestimators. The criterion approximates the outofsample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the outofsample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as more »
 Publication Date:
 NSFPAR ID:
 10357819
 Journal Name:
 Proceedings of Machine Learning Research
 Volume:
 178
 Page Range or eLocationID:
 19121947
 ISSN:
 26403498
 Sponsoring Org:
 National Science Foundation
More Like this


Abstract Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finitesample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from $d$dimensional nonidentical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth and median estimators and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near optimal when data are i.i.d. and when the fraction of ‘lownoise’ distributions is as small as $\varOmega \left (\frac{d \log n}{n}\right )$, where $n$ is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.

We consider the highdimensional linear regression model and assume that a fraction of the responses are contaminated by an adversary with complete knowledge of the data and the underlying distribution. We are interested in the situation when the dense additive noise can be heavytailed but the predictors have subGaussian distribution. We establish minimax lower bounds that depend on the the fraction of the contaminated data and the tails of the additive noise. Moreover, we design a modification of the square root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, with the performance guarantees that take the form of subGaussian deviation inequalities and match the lower error bounds up to logfactors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze the performance of the proposed estimator, we prove several properties of matrices with subGaussian rows that could be of independent interest.

We consider the highdimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $\beta^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $\beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $\beta^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q$rationality); or irrational numbers taking values in a rationally independent set of bounded cardinality, known to learner; collectively called as the mixedrange assumption. Using a novel combination of the Partial Sum of Least Squares (PSLQ) integer relation detection, and the LenstraLenstraLov\'asz (LLL) lattice basis reduction algorithms, we propose a polynomialtime algorithm which provably recovers a $\beta^*\in\mathbb{R}^p$ enjoying the mixedrange assumption, from its linear measurements $Y=X\beta^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement ($n=1$). In the noisy setting, we propose a polynomialtime, latticebased algorithm, which recovers a $\beta^*\in\mathbb{R}^p$ enjoying the $Q$rationality property, from its noisy measurements $Y=X\beta^*+W\in\mathbb{R}^n$, even from a single sample ($n=1$). We further establish that for large $Q$, and normal noise, this algorithm tolerates informationtheoretically optimal level ofmore »

We provide a computationally and statistically efficient estimator for the classical problem of truncated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its applications are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent estimators of the regression coefficients have been identified, the error rates are not wellunderstood,especially in highdimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealed sample, we provide a computationally efficient estimator for the coefficient vectorwfromnrevealed samples that attains`2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradient Descent (PSGD) on the negative loglikelihood of the truncated sample, and only needs oracle access to the setS, which may otherwise be arbitrary, and in particular may be nonconvex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativeloglikelihood is stronglymore »