Outcome-guided disease subtyping by generative model and weighted joint likelihood in transcriptomic applications

Li, Yujia; Liu, Peng; Wang, Wenjia; Zong, Wei; Fang, Yusi; Ren, Zhao; Tang, Lu; Celedón, Juan C; Oesterreich, Steffi; Tseng, George C

doi:10.1214/23-AOAS1865

Citation Details

Outcome-guided disease subtyping by generative model and weighted joint likelihood in transcriptomic applications

With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multifaceted cluster structures that can be defined by different sets of genes. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a prespecified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association. more »

Award ID(s):: 2113568

PAR ID:: 10608002

Author(s) / Creator(s):: Li, Yujia; Liu, Peng; Wang, Wenjia; Zong, Wei; Fang, Yusi; Ren, Zhao; Tang, Lu; Celedón, Juan C; Oesterreich, Steffi; Tseng, George C

Publisher / Repository:: Institute of Mathematical Statistics

Date Published:: 2024-09-01

Journal Name:: The Annals of Applied Statistics

Volume:: 18

Issue:: 3

ISSN:: 1932-6157

Page Range / eLocation ID:: 1947–1964

Subject(s) / Keyword(s):: Disease subtyping omics data high-dimensional cluster analysis generative model weighted joint likelihood.

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1214/23-AOAS1865

More Like this