skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 2 until 12:00 AM ET on Saturday, May 3 due to maintenance. We apologize for the inconvenience.


Title: Computationally and Statistically Efficient Truncated Regression
We provide a computationally and statistically efficient estimator for the classical problem of trun-cated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its appli-cations are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent es-timators of the regression coefficients have been identified, the error rates are not well-understood,especially in high-dimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealedsample, we provide a computationally efficient estimator for the coefficient vectorwfromnre-vealed samples that attains`2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradi-ent Descent (PSGD) on the negative log-likelihood of the truncated sample, and only needs ora-cle access to the setS, which may otherwise be arbitrary, and in particular may be non-convex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativelog-likelihood is strongly convex, which in turn is established using concentration of matrices onvariables with sub-exponential tails. We perform experiments on simulated data to illustrate theaccuracy of our estimator.As a corollary of our work, we show that SGD provably learns the parameters of single-layerneural networks with noisy Relu activation functions Nair and Hinton (2010); Bengio et al. (2013);Gulcehre et al. (2016), given linearly many, in the number of network parameters, input-outputpairs in the realizable setting.  more » « less
Award ID(s):
1741137
PAR ID:
10123632
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
XX
ISSN:
2640-3498
Page Range / eLocation ID:
1-31
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We provide a computationally and statistically efficient estimator for the classical problem of trun-cated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its appli-cations are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent es-timators of the regression coefficients have been identified, the error rates are not well-understood,especially in high-dimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealed sample, we provide a computationally efficient estimator for the coefficient vectorwfromnre-vealed samples that attains`2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradi-ent Descent (PSGD) on the negative log-likelihood of the truncated sample, and only needs ora-cle access to the setS, which may otherwise be arbitrary, and in particular may be non-convex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativelog-likelihood is strongly convex, which in turn is established using concentration of matrices onvariables with sub-exponential tails. We perform experiments on simulated data to illustrate the accuracy of our estimator.As a corollary of our work, we show that SGD provably learns the parameters of single-layerneural networks with noisy Relu activation functions Nair and Hinton (2010); Bengio et al. (2013);Gulcehre et al. (2016), given linearly many, in the number of network parameters, input-outputpairs in the realizable setting. 
    more » « less
  2. We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal (μ,Σ) means a samples is only revealed if it falls in some subset S⊆ℝd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible. 
    more » « less
  3. We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal (μ,Σ) means a samples is only revealed if it falls in some subset S⊆ℝd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible. 
    more » « less
  4. We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal N(μ,Σ) means a samples is only revealed if it falls in some subset S⊆Rd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible. 
    more » « less
  5. We consider the question of learning the natural parameters of a k parameter \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We provide finite sample guarantees to achieve an l2 error of α in the parameter estimation with sample complexity O(poly(k/α)) and computational complexity O(poly(k/α)). To establish these results, we show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family. 
    more » « less