skip to main content


Title: Tree-weighting for multi-study ensemble learners
Multi-study learning uses multiple training studies, separately trains classifiers on individual studies, and then forms ensembles with weights rewarding members with better cross-study prediction ability. This article considers novel weighting approaches for constructing tree-based ensemble learners in this setting. Using Random Forests as a single-study learner, we perform a comparison of either weighting each forest to form the ensemble, or extracting the individual trees trained by each Random Forest and weighting them directly. We consider weighting approaches that reward cross-study replicability within the training set. We find that incorporating multiple layers of ensembling in the training process increases the robustness of the resulting predictor. Furthermore, we explore the mechanisms by which the ensembling weights correspond to the internal structure of trees to shed light on the important features in determining the relationship between the Random Forests algorithm and the true outcome model. Finally, we apply our approach to genomic datasets and show that our method improves upon the basic multi-study learning paradigm.  more » « less
Award ID(s):
1810829
PAR ID:
10105531
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Pacific Symposium on Biocomputing 2020
Page Range / eLocation ID:
451-462
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT

    The aim of this paper is to systematically investigate merging and ensembling methods for spatially varying coefficient mixed effects models (SVCMEM) in order to carry out integrative learning of neuroimaging data obtained from multiple biomedical studies. The ”merged” approach involves training a single learning model using a comprehensive dataset that encompasses information from all the studies. Conversely, the ”ensemble” approach involves creating a weighted average of distinct learning models, each developed from an individual study. We systematically investigate the prediction accuracy of the merged and ensemble learners under the presence of different degrees of interstudy heterogeneity. Additionally, we establish asymptotic guidelines for making strategic decisions about when to employ either of these models in different scenarios, along with deriving optimal weights for the ensemble learner. To validate our theoretical results, we perform extensive simulation studies. The proposed methodology is also applied to 3 large-scale neuroimaging studies.

     
    more » « less
  2. Ensemble-based change detection can improve map accuracies by combining information from multiple datasets. There is a growing literature investigating ensemble inputs and applications for forest disturbance detection and mapping. However, few studies have evaluated ensemble methods other than Random Forest classifiers, which rely on uninterpretable “black box” algorithms with hundreds of parameters. Additionally, most ensemble-based disturbance maps do not utilize independently and systematically collected field-based forest inventory measurements. Here, we compared three approaches for combining change detection results generated from multi-spectral Landsat time series with forest inventory measurements to map forest harvest events at an annual time step. We found that seven-parameter degenerate decision tree ensembles performed at least as well as 500-tree Random Forest ensembles trained and tested on the same LandTrendr segmentation results and both supervised decision tree methods consistently outperformed the top-performing voting approach (majority). Comparisons with an existing national forest disturbance dataset indicated notable improvements in accuracy that demonstrate the value of developing locally calibrated, process-specific disturbance datasets like the harvest event maps developed in this study. Furthermore, by using multi-date forest inventory measurements, we are able to establish a lower bound of 30% basal area removal on detectable harvests, providing biophysical context for our harvest event maps. Our results suggest that simple interpretable decision trees applied to multi-spectral temporal segmentation outputs can be as effective as more complex machine learning approaches for characterizing forest harvest events ranging from partial clearing to clear cuts, with important implications for locally accurate mapping of forest harvests and other types of disturbances. 
    more » « less
  3. Random forests use ensembles of decision trees to boost accuracy for machine learning tasks. However, large ensembles slow down inference on platforms that process each tree in an ensemble individually. We present Bolt, a platform that restructures whole random forests, not just individual trees, to speed up inference. Conceptually, Bolt maps every path in each tree to a lookup table which, if cache were large enough, would allow inference with just one memory access. When the size of the lookup table exceeds cache capacity, Bolt employs a novel combination of lossless compression, parameter selection, and bloom filters to shrink the table while preserving fast inference. We compared inference speed in Bolt to three state-of-the-art platforms: Python Scikit-Learn, Ranger, and Forest Packing. We evaluated these platforms using datasets with vision, natural language processing and categorical applications. We observed that on ensembles of shallow decision trees Bolt can run 2-14X faster than competing platforms and that Bolt's speedups persist as the number of decision trees in an ensemble increases. 
    more » « less
  4. It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets before model fitting can produce poor out‐of‐study prediction performance when datasets are heterogeneous. Theoretical and applied work has shownmultistudy ensemblingto be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multistudy ensembling uses a two‐stagestackingstrategy which fits study‐specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model‐fitting stage, potentially resulting in performance losses. Motivated by challenges in the estimation of COVID‐attributable mortality, we proposeoptimal ensemble construction, an approach to multistudy stacking whereby we jointly estimate ensemble weights and parameters associated with study‐specific models. We prove that limiting cases of our approach yield existing methods such as multistudy stacking and pooling datasets before model fitting. We propose an efficient block coordinate descent algorithm to optimize the loss function. We use our method to perform multicountry COVID‐19 baseline mortality prediction. We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy. We further compare and characterize the method's performance in data‐driven simulations and other numerical experiments. Our method remains competitive with or outperforms multistudy stacking and other earlier methods in the COVID‐19 data application and in a range of simulation settings.

     
    more » « less
  5. Administrative errors in unemployment insurance (UI) decisions give rise to a public values conflict between efficiency and efficacy. We analyze whether artificial intelligence (AI) – in particular, methods in machine learning (ML) – can be used to detect administrative errors in UI claims decisions, both in terms of accuracy and normative tradeoffs. We use 16 years of US Department of Labor audit and policy data on UI claims to analyze the accuracy of 7 different random forest and deep learning models. We further test weighting schemas and synthetic data approaches to correcting imbalances in the training data. A random forest model using gradient descent boosting is more accurate, along several measures, and preferable in terms of public values, than every deep learning model tested. Adjusting model weights produces significant recall improvements for low-n outcomes, at the expense of precision. Synthetic data produces attenuated improvements and drawbacks relative to weights. 
    more » « less