Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data

Dousti Mousavi, Niloufar; Aldirawi, Hani; Yang, Jie

doi:10.3390/biotech12030052

Citation Details

Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data

Categorical data analysis becomes challenging when high-dimensional sparse covariates are involved, which is often the case for omics data. We introduce a statistical procedure based on multinomial logistic regression analysis for such scenarios, including variable screening, model selection, order selection for response categories, and variable selection. We perform our procedure on high-dimensional gene expression data with 801 patients, 2426 genes, and five types of cancerous tumors. As a result, we recommend three finalized models: one with 74 genes achieves extremely low cross-entropy loss and zero predictive error rate based on a five-fold cross-validation; and two other models with 31 and 4 genes, respectively, are recommended for prognostic multi-gene signatures. more »

Award ID(s):: 1924859

PAR ID:: 10479221

Author(s) / Creator(s):: Dousti Mousavi, Niloufar; Aldirawi, Hani; Yang, Jie

Publisher / Repository:: MDPI

Date Published:: 2023-09-01

Journal Name:: BioTech

Volume:: 12

Issue:: 3

ISSN:: 2673-6284

Page Range / eLocation ID:: 52

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.3390/biotech12030052

More Like this