skip to main content

Title: Inference in high-dimensional linear regression via lattice basis reduction and integer relation detection
We consider the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $\beta^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $\beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $\beta^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q-$rationality); or irrational numbers taking values in a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-range assumption. Using a novel combination of the Partial Sum of Least Squares (PSLQ) integer relation detection, and the Lenstra-Lenstra-Lov\'asz (LLL) lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $\beta^*\in\mathbb{R}^p$ enjoying the mixed-range assumption, from its linear measurements $Y=X\beta^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement ($n=1$). In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $\beta^*\in\mathbb{R}^p$ enjoying the $Q-$rationality property, from its noisy measurements $Y=X\beta^*+W\in\mathbb{R}^n$, even from a single sample ($n=1$). We further establish that for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of more » noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample ($n=1$) regime, where the sparsity-based methods such as the Least Absolute Shrinkage and Selection Operator (LASSO) and the Basis Pursuit are known to fail. Furthermore, our results also reveal algorithmic connections between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems. « less
Authors:
; ;
Award ID(s):
2022448
Publication Date:
NSF-PAR ID:
10343454
Journal Name:
IEEE transactions on information theory
ISSN:
1557-9654
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We study the sparsity of the solutions to systems of linear Diophantine equations with and without non-negativity constraints. The sparsity of a solution vector is the number of its nonzero entries, which is referred to as the$$\ell _0$$0-norm of the vector. Our main results are new improved bounds on the minimal$$\ell _0$$0-norm of solutions to systems$$A\varvec{x}=\varvec{b}$$Ax=b, where$$A\in \mathbb {Z}^{m\times n}$$AZm×n,$${\varvec{b}}\in \mathbb {Z}^m$$bZmand$$\varvec{x}$$xis either a general integer vector (lattice case) or a non-negative integer vector (semigroup case). In certain cases, we give polynomial time algorithms for computing solutions with$$\ell _0$$0-norm satisfying the obtained bounds. We show that our bounds are tight. Our bounds can be seen as functions naturally generalizing the rank of a matrix over$$\mathbb {R}$$R, to other subdomains such as$$\mathbb {Z}$$Z. We show that these new rank-like functions are all NP-hard to compute in general, but polynomial-time computable for fixed number of variables.

  2. We provide a computationally and statistically efficient estimator for the classical problem of trun-cated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its appli-cations are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent es-timators of the regression coefficients have been identified, the error rates are not well-understood,especially in high-dimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealed sample, we provide a computationally efficient estimator for the coefficient vectorwfromnre-vealed samples that attains`2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradi-ent Descent (PSGD) on the negative log-likelihood of the truncated sample, and only needs ora-cle access to the setS, which may otherwise be arbitrary, and in particular may be non-convex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativelog-likelihood is stronglymore »convex, which in turn is established using concentration of matrices onvariables with sub-exponential tails. We perform experiments on simulated data to illustrate the accuracy of our estimator.As a corollary of our work, we show that SGD provably learns the parameters of single-layerneural networks with noisy Relu activation functions Nair and Hinton (2010); Bengio et al. (2013);Gulcehre et al. (2016), given linearly many, in the number of network parameters, input-outputpairs in the realizable setting.« less
  3. Abstract We consider the problem of computing the partition function $\sum _x e^{f(x)}$ , where $f: \{-1, 1\}^n \longrightarrow {\mathbb R}$ is a quadratic or cubic polynomial on the Boolean cube $\{-1, 1\}^n$ . In the case of a quadratic polynomial f , we show that the partition function can be approximated within relative error $0 < \epsilon < 1$ in quasi-polynomial $n^{O(\ln n - \ln \epsilon )}$ time if the Lipschitz constant of the non-linear part of f with respect to the $\ell ^1$ metric on the Boolean cube does not exceed $1-\delta $ , for any $\delta>0$ , fixed in advance. For a cubic polynomial f , we get the same result under a somewhat stronger condition. We apply the method of polynomial interpolation, for which we prove that $\sum _x e^{\tilde {f}(x)} \ne 0$ for complex-valued polynomials $\tilde {f}$ in a neighborhood of a real-valued f satisfying the above mentioned conditions. The bounds are asymptotically optimal. Results on the zero-free region are interpreted as the absence of a phase transition in the Lee–Yang sense in the corresponding Ising model. The novel feature of the bounds is that they control the total interaction of each vertex but notmore »every single interaction of sets of vertices.« less
  4. We provide a computationally and statistically efficient estimator for the classical problem of trun-cated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its appli-cations are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent es-timators of the regression coefficients have been identified, the error rates are not well-understood,especially in high-dimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealedsample, we provide a computationally efficient estimator for the coefficient vectorwfromnre-vealed samples that attains`2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradi-ent Descent (PSGD) on the negative log-likelihood of the truncated sample, and only needs ora-cle access to the setS, which may otherwise be arbitrary, and in particular may be non-convex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativelog-likelihood is strongly convex,more »which in turn is established using concentration of matrices onvariables with sub-exponential tails. We perform experiments on simulated data to illustrate theaccuracy of our estimator.As a corollary of our work, we show that SGD provably learns the parameters of single-layerneural networks with noisy Relu activation functions Nair and Hinton (2010); Bengio et al. (2013);Gulcehre et al. (2016), given linearly many, in the number of network parameters, input-outputpairs in the realizable setting.« less
  5. We study a noisy tensor completion problem of broad practical interest, namely, the reconstruction of a low-rank tensor from highly incomplete and randomly corrupted observations of its entries. Whereas a variety of prior work has been dedicated to this problem, prior algorithms either are computationally too expensive for large-scale applications or come with suboptimal statistical guarantees. Focusing on “incoherent” and well-conditioned tensors of a constant canonical polyadic rank, we propose a two-stage nonconvex algorithm—(vanilla) gradient descent following a rough initialization—that achieves the best of both worlds. Specifically, the proposed nonconvex algorithm faithfully completes the tensor and retrieves all individual tensor factors within nearly linear time, while at the same time enjoying near-optimal statistical guarantees (i.e., minimal sample complexity and optimal estimation accuracy). The estimation errors are evenly spread out across all entries, thus achieving optimal [Formula: see text] statistical accuracy. We also discuss how to extend our approach to accommodate asymmetric tensors. The insight conveyed through our analysis of nonconvex optimization might have implications for other tensor estimation problems.