GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies

Lin, Lin; Spreng, Rachel L; Seaton, Kelly E; Dennison, S Moses; Dahora, Lindsay C; Schuster, Daniel J; Sawant, Sheetal; Gilbert, Peter B; Fong, Youyi; Kisalu, Neville; Pollard, Andrew J; Tomaras, Georgia D; Li, Jia

doi:10.1371/journal.pcbi.1012581

Citation Details

GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies

Despite significant progress in vaccine research, the level of protection provided by vaccination can vary significantly across individuals. As a result, understanding immunologic variation across individuals in response to vaccination is important for developing next-generation efficacious vaccines. Accurate outcome prediction and identification of predictive biomarkers would represent a significant step towards this goal. Moreover, in early phase vaccine clinical trials, small datasets are prevalent, raising the need and challenge of building a robust and explainable prediction model that can reveal heterogeneity in small datasets. We propose a new model named Generative Mixture of Logistic Regression (GeM-LR), which combines characteristics of both a generative and a discriminative model. In addition, we propose a set of model selection strategies to enhance the robustness and interpretability of the model. GeM-LR extends a linear classifier to a non-linear classifier without losing interpretability and empowers the notion of predictive clustering for characterizing data heterogeneity in connection with the outcome variable. We demonstrate the strengths and utility of GeM-LR by applying it to data from several studies. GeM-LR achieves better prediction results than other popular methods while providing interpretations at different levels. more »