skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Mixed-Integer Fractional Optimization Approach to Best Subset Selection
We consider the best subset selection problem in linear regression—that is, finding a parsimonious subset of the regression variables that provides the best fit to the data according to some predefined criterion. We are primarily concerned with alternatives to cross-validation methods that do not require data partitioning and involve a range of information criteria extensively studied in the statistical literature. We show that the problem of interest can be modeled using fractional mixed-integer optimization, which can be tackled by leveraging recent advances in modern optimization solvers. The proposed algorithms involve solving a sequence of mixed-integer quadratic optimization problems (or their convexifications) and can be implemented with off-the-shelf solvers. We report encouraging results in our computational experiments, with respect to both the optimization and statistical performance. Summary of Contribution: This paper considers feature selection problems with information criteria. We show that by adopting a fractional optimization perspective (a well-known field in nonlinear optimization and operations research), it is possible to leverage recent advances in mixed-integer quadratic optimization technology to tackle traditional statistical problems long considered intractable. We present extensive computational experiments, with both synthetic and real data, illustrating that the new fractional optimization approach is orders of magnitude faster than existing approaches in the literature.  more » « less
Award ID(s):
1818700
PAR ID:
10289598
Author(s) / Creator(s):
;
Date Published:
Journal Name:
INFORMS Journal on Computing
ISSN:
1091-9856
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The L 0 -regularized least squares problem (a.k.a. best subsets) is central to sparse statistical learning and has attracted significant attention across the wider statistics, machine learning, and optimization communities. Recent work has shown that modern mixed integer optimization (MIO) solvers can be used to address small to moderate instances of this problem. In spite of the usefulness of L 0 -based estimators and generic MIO solvers, there is a steep computational price to pay when compared with popular sparse learning algorithms (e.g., based on L 1 regularization). In this paper, we aim to push the frontiers of computation for a family of L 0 -regularized problems with additional convex penalties. We propose a new hierarchy of necessary optimality conditions for these problems. We develop fast algorithms, based on coordinate descent and local combinatorial optimization, that are guaranteed to converge to solutions satisfying these optimality conditions. From a statistical viewpoint, an interesting story emerges. When the signal strength is high, our combinatorial optimization algorithms have an edge in challenging statistical settings. When the signal is lower, pure L 0 benefits from additional convex regularization. We empirically demonstrate that our family of L 0 -based estimators can outperform the state-of-the-art sparse learning algorithms in terms of a combination of prediction, estimation, and variable selection metrics under various regimes (e.g., different signal strengths, feature correlations, number of samples and features). Our new open-source sparse learning toolkit L0Learn (available on CRAN and GitHub) reaches up to a threefold speedup (with p up to 10 6 ) when compared with competing toolkits such as glmnet and ncvreg. 
    more » « less
  2. Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational times. To address this difficulty, we tackle the problem from both computational and statistical perspectives. On the one hand, we propose a concrete early stopping criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution. On the other hand, we improve the existing formulations by replacing the linear “big-M " constraints that represent the relationship between the continuous and binary indicator variables with second-order conic constraints. Our numerical results demonstrate the effectiveness of the proposed approaches. 
    more » « less
  3. null (Ed.)
    Under the linear regression framework, we study the variable selection problem when the underlying model is assumed to have a small number of nonzero coefficients. Non-convex penalties in speci c forms are well-studied in the literature for sparse estimation. A recent work, Ahn, Pang, and Xin (2017), has pointed out that nearly all existing non-convex penalties can be represented as difference-of-convex (DC) functions, which are the difference of two convex functions, while itself may not be convex. There is a large existing literature on optimization problems when their objectives and/or constraints involve DC functions. Efficient numerical solutions have been proposed. Under the DC framework, directional-stationary (d-stationary) solutions are considered, and they are usually not unique. In this paper, we show that under some mild conditions, a certain subset of d-stationary solutions in an optimization problem (with a DC objective) has some ideal statistical properties: namely, asymptotic estimation consistency, asymptotic model selection consistency, asymptotic efficiency. Our assumptions are either weaker than or comparable with those conditions that have been adopted in other existing works. This work shows that DC is a nice framework to offer a uni ed approach to these existing works where non-convex penalties are involved. Our work bridges the communities of optimization and statistics. 
    more » « less
  4. Space mission planning and spacecraft design are tightly coupled and need to be considered together for optimal performance; however, this integrated optimization problem results in a large-scale Mixed-Integer Nonlinear Programming (MINLP) problem, which is challenging to solve. In response to this challenge, this paper proposes a new solution approach to this MINLP problem by iterative solving a set of coupled subproblems via the augmented Lagrangian coordination approach following the philosophy of Multi-disciplinary Design Optimization (MDO). The proposed approach leverages the unique structure of the problem that enables its decomposition into a set of coupled subproblems of different types: a Mixed-Integer Quadratic Programming (MIQP) subproblem for mission planning and one or more Nonlinear Programming (NLP) subproblem(s) for spacecraft design. Since specialized MIQP or NLP solvers can be applied to each subproblem, the proposed approach can efficiently solve the otherwise intractable integrated MINLP problem. An automatic and effective method to find an initial solution for this iterative approach is also proposed so that the optimization can be performed without the need for a user-defined initial guess. In the demonstration case study, a human lunar exploration mission sequence is optimized with a subsystem-level parametric spacecraft design model. Compared to the state-of-the-art method, the proposed formulation can obtain a better solution with a shorter computational time even without parallelization. For larger problems, the proposed solution approach can also be easily parallelizable and thus is expected to be further advantageous and scalable. 
    more » « less
  5. We study the polyhedral convex hull structure of a mixed-integer set which arises in a class of cardinality-constrained concave submodular minimization problems. This class of problems has an objective function in the form of $f(a^Tx)$, where f is a univariate concave function, a is a non-negative vector, and x is a binary vector of appropriate dimension. Such minimization problems frequently appear in applications that involve risk-aversion or economies of scale. We propose three classes of strong valid linear inequalities for this convex hull and specify their facet conditions when a has two distinct values. We show how to use these inequalities to obtain valid inequalities for general a that contains multiple values. We further provide a complete linear convex hull description for this mixed-integer set when a contains two distinct values and the cardinality constraint upper bound is two. Our computational experiments on the mean-risk optimization problem demonstrate the effectiveness of the proposed inequalities in a branch-and-cut framework. 
    more » « less