skip to main content


Title: Computationally and Statistically Efficient Truncated Regression
We provide a computationally and statistically efficient estimator for the classical problem of trun-cated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its appli-cations are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent es-timators of the regression coefficients have been identified, the error rates are not well-understood,especially in high-dimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealedsample, we provide a computationally efficient estimator for the coefficient vectorwfromnre-vealed samples that attains`2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradi-ent Descent (PSGD) on the negative log-likelihood of the truncated sample, and only needs ora-cle access to the setS, which may otherwise be arbitrary, and in particular may be non-convex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativelog-likelihood is strongly convex, which in turn is established using concentration of matrices onvariables with sub-exponential tails. We perform experiments on simulated data to illustrate theaccuracy of our estimator.As a corollary of our work, we show that SGD provably learns the parameters of single-layerneural networks with noisy Relu activation functions Nair and Hinton (2010); Bengio et al. (2013);Gulcehre et al. (2016), given linearly many, in the number of network parameters, input-outputpairs in the realizable setting.  more » « less
Award ID(s):
1741137
NSF-PAR ID:
10123632
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
XX
ISSN:
2640-3498
Page Range / eLocation ID:
1-31
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We provide a computationally and statistically efficient estimator for the classical problem of trun-cated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its appli-cations are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent es-timators of the regression coefficients have been identified, the error rates are not well-understood,especially in high-dimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealed sample, we provide a computationally efficient estimator for the coefficient vectorwfromnre-vealed samples that attains`2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradi-ent Descent (PSGD) on the negative log-likelihood of the truncated sample, and only needs ora-cle access to the setS, which may otherwise be arbitrary, and in particular may be non-convex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativelog-likelihood is strongly convex, which in turn is established using concentration of matrices onvariables with sub-exponential tails. We perform experiments on simulated data to illustrate the accuracy of our estimator.As a corollary of our work, we show that SGD provably learns the parameters of single-layerneural networks with noisy Relu activation functions Nair and Hinton (2010); Bengio et al. (2013);Gulcehre et al. (2016), given linearly many, in the number of network parameters, input-outputpairs in the realizable setting. 
    more » « less
  2. null (Ed.)
    https://arxiv.org/abs/2007.14539 As in standard linear regression, in truncated linear regression, we are given access to observations (Ai,yi)i whose dependent variable equals yi=ATi⋅x∗+ηi, where x∗ is some fixed unknown vector of interest and ηi is independent noise; except we are only given an observation if its dependent variable yi lies in some "truncation set" S⊂ℝ. The goal is to recover x∗ under some favorable conditions on the Ai's and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering k-sparse n-dimensional vectors x∗ from m truncated samples, which attains an optimal ℓ2 reconstruction error of O((klogn)/m‾‾‾‾‾‾‾‾‾‾√). As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest. 
    more » « less
  3. We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal (μ,Σ) means a samples is only revealed if it falls in some subset S⊆ℝd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible. 
    more » « less
  4. We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal N(μ,Σ) means a samples is only revealed if it falls in some subset S⊆Rd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible. 
    more » « less
  5. We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal (μ,Σ) means a samples is only revealed if it falls in some subset S⊆ℝd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible. 
    more » « less