Merging versus Ensembling in Multi-Study Machine Learning: Theoretical Insight from Random Effects

Guan, Zoe; Parmigiani, Giovanni; Patil, Prasad

Citation Details

A critical decision point when training predictors using multiple studies is whether these studies should be combined or treated separately. We compare two multi-study learning approaches in the presence of potential heterogeneity in predictor-outcome relationships across datasets. We consider 1) merging all of the datasets and training a single learner, and 2) cross-study learning, which involves training a separate learner on each dataset and combining the resulting predictions. In a linear regression setting, we show analytically and confirm via simulation that merging yields lower prediction error than cross-study learning when the predictor-outcome relationships are relatively homogeneous across studies. However, as heterogeneity increases, there exists a transition point beyond which cross-study learning outperforms merging. We provide analytic expressions for the transition point in various scenarios and study asymptotic properties. more »

Award ID(s):: 1810829

PAR ID:: 10105511

Author(s) / Creator(s):: Guan, Zoe; Parmigiani, Giovanni; Patil, Prasad

Date Published:: 2019-05-17

Journal Name:: ArXiv.org

ISSN:: 2331-8422

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this