- Award ID(s):
- 1818700
- NSF-PAR ID:
- 10289598
- Date Published:
- Journal Name:
- INFORMS Journal on Computing
- ISSN:
- 1091-9856
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)The L 0 -regularized least squares problem (a.k.a. best subsets) is central to sparse statistical learning and has attracted significant attention across the wider statistics, machine learning, and optimization communities. Recent work has shown that modern mixed integer optimization (MIO) solvers can be used to address small to moderate instances of this problem. In spite of the usefulness of L 0 -based estimators and generic MIO solvers, there is a steep computational price to pay when compared with popular sparse learning algorithms (e.g., based on L 1 regularization). In this paper, we aim to push the frontiers of computation for a family of L 0 -regularized problems with additional convex penalties. We propose a new hierarchy of necessary optimality conditions for these problems. We develop fast algorithms, based on coordinate descent and local combinatorial optimization, that are guaranteed to converge to solutions satisfying these optimality conditions. From a statistical viewpoint, an interesting story emerges. When the signal strength is high, our combinatorial optimization algorithms have an edge in challenging statistical settings. When the signal is lower, pure L 0 benefits from additional convex regularization. We empirically demonstrate that our family of L 0 -based estimators can outperform the state-of-the-art sparse learning algorithms in terms of a combination of prediction, estimation, and variable selection metrics under various regimes (e.g., different signal strengths, feature correlations, number of samples and features). Our new open-source sparse learning toolkit L0Learn (available on CRAN and GitHub) reaches up to a threefold speedup (with p up to 10 6 ) when compared with competing toolkits such as glmnet and ncvreg.more » « less
-
null (Ed.)Under the linear regression framework, we study the variable selection problem when the underlying model is assumed to have a small number of nonzero coefficients. Non-convex penalties in specic forms are well-studied in the literature for sparse estimation. A recent work, Ahn, Pang, and Xin (2017), has pointed out that nearly all existing non-convex penalties can be represented as difference-of-convex (DC) functions, which are the difference of two convex functions, while itself may not be convex. There is a large existing literature on optimization problems when their objectives and/or constraints involve DC functions. Efficient numerical solutions have been proposed. Under the DC framework, directional-stationary (d-stationary) solutions are considered, and they are usually not unique. In this paper, we show that under some mild conditions, a certain subset of d-stationary solutions in an optimization problem (with a DC objective) has some ideal statistical properties: namely, asymptotic estimation consistency, asymptotic model selection consistency, asymptotic efficiency. Our assumptions are either weaker than or comparable with those conditions that have been adopted in other existing works. This work shows that DC is a nice framework to offer a unied approach to these existing works where non-convex penalties are involved. Our work bridges the communities of optimization and statistics.more » « less
-
Space mission planning and spacecraft design are tightly coupled and need to be considered together for optimal performance; however, this integrated optimization problem results in a large-scale Mixed-Integer Nonlinear Programming (MINLP) problem, which is challenging to solve. In response to this challenge, this paper proposes a new solution approach to this MINLP problem by iterative solving a set of coupled subproblems via the augmented Lagrangian coordination approach following the philosophy of Multi-disciplinary Design Optimization (MDO). The proposed approach leverages the unique structure of the problem that enables its decomposition into a set of coupled subproblems of different types: a Mixed-Integer Quadratic Programming (MIQP) subproblem for mission planning and one or more Nonlinear Programming (NLP) subproblem(s) for spacecraft design. Since specialized MIQP or NLP solvers can be applied to each subproblem, the proposed approach can efficiently solve the otherwise intractable integrated MINLP problem. An automatic and effective method to find an initial solution for this iterative approach is also proposed so that the optimization can be performed without the need for a user-defined initial guess. In the demonstration case study, a human lunar exploration mission sequence is optimized with a subsystem-level parametric spacecraft design model. Compared to the state-of-the-art method, the proposed formulation can obtain a better solution with a shorter computational time even without parallelization. For larger problems, the proposed solution approach can also be easily parallelizable and thus is expected to be further advantageous and scalable.more » « less
-
We consider the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $\beta^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $\beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $\beta^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q-$rationality); or irrational numbers taking values in a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-range assumption. Using a novel combination of the Partial Sum of Least Squares (PSLQ) integer relation detection, and the Lenstra-Lenstra-Lov\'asz (LLL) lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $\beta^*\in\mathbb{R}^p$ enjoying the mixed-range assumption, from its linear measurements $Y=X\beta^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement ($n=1$). In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $\beta^*\in\mathbb{R}^p$ enjoying the $Q-$rationality property, from its noisy measurements $Y=X\beta^*+W\in\mathbb{R}^n$, even from a single sample ($n=1$). We further establish that for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample ($n=1$) regime, where the sparsity-based methods such as the Least Absolute Shrinkage and Selection Operator (LASSO) and the Basis Pursuit are known to fail. Furthermore, our results also reveal algorithmic connections between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems.more » « less
-
Abstract This paper studies several solution paths of sparse quadratic minimization problems as a function of the weighing parameter of the bi-objective of estimation loss versus solution sparsity. Three such paths are considered: the “
-path” where the discontinuous$$\ell _0$$ -function provides the exact sparsity count; the “$$\ell _0$$ -path” where the$$\ell _1$$ -function provides a convex surrogate of sparsity count; and the “capped$$\ell _1$$ -path” where the nonconvex nondifferentiable capped$$\ell _1$$ -function aims to enhance the$$\ell _1$$ -approximation. Serving different purposes, each of these three formulations is different from each other, both analytically and computationally. Our results deepen the understanding of (old and new) properties of the associated paths, highlight the pros, cons, and tradeoffs of these sparse optimization models, and provide numerical evidence to support the practical superiority of the capped$$\ell _1$$ -path. Our study of the capped$$\ell _1$$ -path is interesting in its own right as the path pertains to computable directionally stationary (= strongly locally minimizing in this context, as opposed to globally optimal) solutions of a parametric nonconvex nondifferentiable optimization problem. Motivated by classical parametric quadratic programming theory and reinforced by modern statistical learning studies, both casting an exponential perspective in fully describing such solution paths, we also aim to address the question of whether some of them can be fully traced in strongly polynomial time in the problem dimensions. A major conclusion of this paper is that a path of directional stationary solutions of the capped$$\ell _1$$ -regularized problem offers interesting theoretical properties and practical compromise between the$$\ell _1$$ -path and the$$\ell _0$$ -path. Indeed, while the$$\ell _1$$ -path is computationally prohibitive and greatly handicapped by the repeated solution of mixed-integer nonlinear programs, the quality of$$\ell _0$$ -path, in terms of the two criteria—loss and sparsity—in the estimation objective, is inferior to the capped$$\ell _1$$ -path; the latter can be obtained efficiently by a combination of a parametric pivoting-like scheme supplemented by an algorithm that takes advantage of the Z-matrix structure of the loss function.$$\ell _1$$