Inference in high-dimensional linear regression via lattice basis reduction and integer relation detection
We consider the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $\beta^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $\beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $\beta^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q-$rationality); or irrational numbers taking values in a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-range assumption. Using a novel combination of the Partial Sum of Least Squares (PSLQ) integer relation detection, and the Lenstra-Lenstra-Lov\'asz (LLL) lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $\beta^*\in\mathbb{R}^p$ enjoying the mixed-range assumption, from its linear measurements $Y=X\beta^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement ($n=1$). In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $\beta^*\in\mathbb{R}^p$ enjoying the $Q-$rationality property, from its noisy measurements $Y=X\beta^*+W\in\mathbb{R}^n$, even from a single sample ($n=1$). We further establish that for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of more »
Authors:
; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10343454
Journal Name:
IEEE transactions on information theory
ISSN:
1557-9654
We study the sparsity of the solutions to systems of linear Diophantine equations with and without non-negativity constraints. The sparsity of a solution vector is the number of its nonzero entries, which is referred to as the$$\ell _0$$${\ell }_{0}$-norm of the vector. Our main results are new improved bounds on the minimal$$\ell _0$$${\ell }_{0}$-norm of solutions to systems$$A\varvec{x}=\varvec{b}$$$Ax=b$, where$$A\in \mathbb {Z}^{m\times n}$$$A\in {Z}^{m×n}$,$${\varvec{b}}\in \mathbb {Z}^m$$$b\in {Z}^{m}$and$$\varvec{x}$$$x$is either a general integer vector (lattice case) or a non-negative integer vector (semigroup case). In certain cases, we give polynomial time algorithms for computing solutions with$$\ell _0$$${\ell }_{0}$-norm satisfying the obtained bounds. We show that our bounds are tight. Our bounds can be seen as functions naturally generalizing the rank of a matrix over$$\mathbb {R}$$$R$, to other subdomains such as$$\mathbb {Z}$$$Z$. We show that these new rank-like functions are all NP-hard to compute in general, but polynomial-time computable for fixed number of variables.
2. We provide a computationally and statistically efficient estimator for the classical problem of trun-cated linear regression, where the dependent variabley=wTx+εand its corresponding vector ofcovariatesx∈Rkare only revealed if the dependent variable falls in some subsetS⊆R; otherwisethe existence of the pair(x,y)is hidden. This problem has remained a challenge since the earlyworks of Tobin (1958); Amemiya (1973); Hausman and Wise (1977); Breen et al. (1996), its appli-cations are abundant, and its history dates back even further to the work of Galton, Pearson, Lee,and Fisher Galton (1897); Pearson and Lee (1908); Lee (1914); Fisher (1931). While consistent es-timators of the regression coefficients have been identified, the error rates are not well-understood,especially in high-dimensional settings.Under a “thickness assumption” about the covariance matrix of the covariates in the revealed sample, we provide a computationally efficient estimator for the coefficient vectorwfromnre-vealed samples that attains2errorO(√k/n), recovering the guarantees of least squares in thestandard (untruncated) linear regression setting. Our estimator uses Projected Stochastic Gradi-ent Descent (PSGD) on the negative log-likelihood of the truncated sample, and only needs ora-cle access to the setS, which may otherwise be arbitrary, and in particular may be non-convex.PSGD must be restricted to an appropriately defined convex cone to guarantee that the negativelog-likelihood is stronglymore »
3. Abstract We consider the problem of computing the partition function $\sum _x e^{f(x)}$ , where $f: \{-1, 1\}^n \longrightarrow {\mathbb R}$ is a quadratic or cubic polynomial on the Boolean cube $\{-1, 1\}^n$ . In the case of a quadratic polynomial f , we show that the partition function can be approximated within relative error $0 < \epsilon < 1$ in quasi-polynomial $n^{O(\ln n - \ln \epsilon )}$ time if the Lipschitz constant of the non-linear part of f with respect to the $\ell ^1$ metric on the Boolean cube does not exceed $1-\delta$ , for any $\delta>0$ , fixed in advance. For a cubic polynomial f , we get the same result under a somewhat stronger condition. We apply the method of polynomial interpolation, for which we prove that $\sum _x e^{\tilde {f}(x)} \ne 0$ for complex-valued polynomials $\tilde {f}$ in a neighborhood of a real-valued f satisfying the above mentioned conditions. The bounds are asymptotically optimal. Results on the zero-free region are interpreted as the absence of a phase transition in the Lee–Yang sense in the corresponding Ising model. The novel feature of the bounds is that they control the total interaction of each vertex but notmore »