Variable Selection for Sparse Data with Applications to Vaginal Microbiome and Gene Expression Data

Dousti Mousavi, Niloufar; Yang, Jie; Aldirawi, Hani

doi:10.3390/genes14020403

Citation Details

Variable Selection for Sparse Data with Applications to Vaginal Microbiome and Gene Expression Data

Sparse data with a high portion of zeros arise in various disciplines. Modeling sparse high-dimensional data is a challenging and growing research area. In this paper, we provide statistical methods and tools for analyzing sparse data in a fairly general and complex context. We utilize two real scientific applications as illustrations, including a longitudinal vaginal microbiome data and a high dimensional gene expression data. We recommend zero-inflated model selections and significance tests to identify the time intervals when the pregnant and non-pregnant groups of women are significantly different in terms of Lactobacillus species. We apply the same techniques to select the best 50 genes out of 2426 sparse gene expression data. The classification based on our selected genes achieves 100% prediction accuracy. Furthermore, the first four principal components based on the selected genes can explain as high as 83% of the model variability. more »

Award ID(s):: 1924859

PAR ID:: 10478990

Author(s) / Creator(s):: Dousti Mousavi, Niloufar; Yang, Jie; Aldirawi, Hani

Publisher / Repository:: MDPI

Date Published:: 2023-02-01

Journal Name:: Genes

Volume:: 14

Issue:: 2

ISSN:: 2073-4425

Page Range / eLocation ID:: 403

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.3390/genes14020403

More Like this