Ensembles of decision trees are a useful tool for obtaining flexible estimates of regression functions. Examples of these methods include gradient-boosted decision trees, random forests and Bayesian classification and regression trees. Two potential shortcomings of tree ensembles are their lack of smoothness and their vulnerability to the curse of dimensionality. We show that these issues can be overcome by instead considering sparsity inducing soft decision trees in which the decisions are treated as probabilistic. We implement this in the context of the Bayesian additive regression trees framework and illustrate its promising performance through testing on benchmark data sets. We provide strong theoretical support for our methodology by showing that the posterior distribution concentrates at the minimax rate (up to a logarithmic factor) for sparse functions and functions with additive structures in the high dimensional regime where the dimensionality of the covariate space is allowed to grow nearly exponentially in the sample size. Our method also adapts to the unknown smoothness and sparsity levels, and can be implemented by making minimal modifications to existing Bayesian additive regression tree algorithms.
- Award ID(s):
- 1854655
- PAR ID:
- 10340883
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- Volume:
- 34
- ISSN:
- 1049-5258
- Page Range / eLocation ID:
- 598-609
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Summary -
Abstract This paper demonstrates the advantages of sharing information about unknown features of covariates across multiple model components in various nonparametric regression problems including multivariate, heteroscedastic, and semicontinuous responses. In this paper, we present a methodology which allows for information to be shared nonparametrically across various model components using Bayesian sum‐of‐tree models. Our simulation results demonstrate that sharing of information across related model components is often very beneficial, particularly in sparse high‐dimensional problems in which variable selection must be conducted. We illustrate our methodology by analyzing medical expenditure data from the Medical Expenditure Panel Survey (MEPS). To facilitate the Bayesian nonparametric regression analysis, we develop two novel models for analyzing the MEPS data using Bayesian additive regression trees—a heteroskedastic log‐normal hurdle model with a “shrink‐toward‐homoskedasticity” prior and a gamma hurdle model.
-
We consider the problem of nonparametric regression in the high-dimensional setting in which P≫N. We study the use of overlapping group structures to improve prediction and variable selection. These structures arise commonly when analyzing DNA microarray data, where genes can naturally be grouped according to genetic pathways. We incorporate overlapping group structure into a Bayesian additive regression trees model using a prior constructed so that, if a variable from some group is used to construct a split, this increases the probability that subsequent splits will use predictors from the same group. We refer to our model as an overlapping group Bayesian additive regression trees (OG-BART) model, and our prior on the splits an overlapping group Dirichlet (OG-Dirichlet) prior. Like the sparse group lasso, our prior encourages sparsity both within and between groups. We study the correlation structure of the prior, illustrate the proposed methodology on simulated data, and apply the methodology to gene expression data to learn which genetic pathways are predictive of breast cancer tumor metastasis.more » « less
-
Many scientific problems can be formulated as sparse regression, i.e., regression onto a set of parameters when there is a desire or expectation that some of the parameters are exactly zero or do not substantially contribute. This includes many problems in signal and image processing, system identification, optimization, and parameter estimation methods such as Gaussian process regression. Sparsity facilitates exploring high-dimensional spaces while finding parsimonious and interpretable solutions. In the present work, we illustrate some of the important ways in which sparse regression appears in plasma physics and point out recent contributions and remaining challenges to solving these problems in this field. A brief review is provided for the optimization problem and the state-of-the-art solvers, especially for constrained and high-dimensional sparse regression.
-
Abstract In this paper, we propose a sparse Bayesian procedure with global and local (GL) shrinkage priors for the problems of variable selection and classification in high‐dimensional logistic regression models. In particular, we consider two types of GL shrinkage priors for the regression coefficients, the horseshoe (HS) prior and the normal‐gamma (NG) prior, and then specify a correlated prior for the binary vector to distinguish models with the same size. The GL priors are then combined with mixture representations of logistic distribution to construct a hierarchical Bayes model that allows efficient implementation of a Markov chain Monte Carlo (MCMC) to generate samples from posterior distribution. We carry out simulations to compare the finite sample performances of the proposed Bayesian method with the existing Bayesian methods in terms of the accuracy of variable selection and prediction. Finally, two real‐data applications are provided for illustrative purposes.