We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study.
This paper applies the Least Absolute Shrinkage and Selection Operator (LASSO) to make rolling one‐minute‐ahead return forecasts using the entire cross‐section of lagged returns as candidate predictors. The LASSO increases both out‐of‐sample fit and forecast‐implied Sharpe ratios. This out‐of‐sample success comes from identifying predictors that are unexpected, short‐lived, and sparse. Although the LASSO uses a statistical rule rather than economic intuition to identify predictors, the predictors it identifies are nevertheless associated with economically meaningful events: the LASSO tends to identify as predictors stocks with news about fundamentals.
more » « less- NSF-PAR ID:
- 10079585
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- The Journal of Finance
- Volume:
- 74
- Issue:
- 1
- ISSN:
- 0022-1082
- Format(s):
- Medium: X Size: p. 449-492
- Size(s):
- p. 449-492
- Sponsoring Org:
- National Science Foundation
More Like this
-
Summary -
This paper studies the asymptotic properties of the penalized least squares estimator using an adaptive group Lasso penalty for the reduced rank regression. The group Lasso penalty is defined in the way that the regression coefficients corresponding to each predictor are treated as one group. It is shown that under certain regularity conditions, the estimator can achieve the minimax optimal rate of convergence. Moreover, the variable selection consistency can also be achieved, that is, the relevant predictors can be identified with probability approaching one. In the asymptotic theory, the number of response variables, the number of predictors and the rank number are allowed to grow to infinity with the sample size. Copyright © 2016 John Wiley & Sons, Ltd.
-
Summary We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p≫n case. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.
-
Abstract A generic out-of-sample error estimate is proposed for $M$-estimators regularized with a convex penalty in high-dimensional linear regression where $(\boldsymbol{X},\boldsymbol{y})$ is observed and the dimension $p$ and sample size $n$ are of the same order. The out-of-sample error estimate enjoys a relative error of order $n^{-1/2}$ in a linear model with Gaussian covariates and independent noise, either non-asymptotically when $p/n\le \gamma $ or asymptotically in the high-dimensional asymptotic regime $p/n\to \gamma ^{\prime}\in (0,\infty )$. General differentiable loss functions $\rho $ are allowed provided that the derivative of the loss is 1-Lipschitz; this includes the least-squares loss as well as robust losses such as the Huber loss and its smoothed versions. The validity of the out-of-sample error estimate holds either under a strong convexity assumption, or for the L1-penalized Huber M-estimator and the Lasso under a sparsity assumption and a bound on the number of contaminated observations. For the square loss and in the absence of corruption in the response, the results additionally yield $n^{-1/2}$-consistent estimates of the noise variance and of the generalization error. This generalizes, to arbitrary convex penalty and arbitrary covariance, estimates that were previously known for the Lasso.
-
Summary We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have proposed ‘SAFE’ rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non-zero coefficients in the solution. We therefore combine them with simple checks of the Karush–Kuhn–Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush–Kuhn–Tucker conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.