NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An R package AZIAD for analysing zero-inflated and zero-altered data

https://doi.org/10.1080/00949655.2023.2207020

Dousti Mousavi, Niloufar; Aldirawi, Hani; Yang, Jie (November 2023, Journal of Statistical Computation and Simulation)

Full Text Available
Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data

https://doi.org/10.3390/biotech12030052

Dousti Mousavi, Niloufar; Aldirawi, Hani; Yang, Jie (September 2023, BioTech)

Categorical data analysis becomes challenging when high-dimensional sparse covariates are involved, which is often the case for omics data. We introduce a statistical procedure based on multinomial logistic regression analysis for such scenarios, including variable screening, model selection, order selection for response categories, and variable selection. We perform our procedure on high-dimensional gene expression data with 801 patients, 2426 genes, and five types of cancerous tumors. As a result, we recommend three finalized models: one with 74 genes achieves extremely low cross-entropy loss and zero predictive error rate based on a five-fold cross-validation; and two other models with 31 and 4 genes, respectively, are recommended for prognostic multi-gene signatures.
more » « less
Full Text Available
Smoothing regression and impact measures for accidents of traffic flows

https://doi.org/10.1080/02664763.2023.2175799

Yu, Zhou; Yang, Jie; Huang, Hsin-Hsiung (February 2023, Journal of Applied Statistics)

Full Text Available
Variable Selection for Sparse Data with Applications to Vaginal Microbiome and Gene Expression Data

https://doi.org/10.3390/genes14020403

Dousti Mousavi, Niloufar; Yang, Jie; Aldirawi, Hani (February 2023, Genes)

Sparse data with a high portion of zeros arise in various disciplines. Modeling sparse high-dimensional data is a challenging and growing research area. In this paper, we provide statistical methods and tools for analyzing sparse data in a fairly general and complex context. We utilize two real scientific applications as illustrations, including a longitudinal vaginal microbiome data and a high dimensional gene expression data. We recommend zero-inflated model selections and significance tests to identify the time intervals when the pregnant and non-pregnant groups of women are significantly different in terms of Lactobacillus species. We apply the same techniques to select the best 50 genes out of 2426 sparse gene expression data. The classification based on our selected genes achieves 100% prediction accuracy. Furthermore, the first four principal components based on the selected genes can explain as high as 83% of the model variability.
more » « less
Full Text Available
Modeling Sparse Data Using MLE with Applications to Microbiome Data

https://doi.org/10.1007/s42519-021-00230-y

Aldirawi, Hani; Yang, Jie (March 2022, Journal of Statistical Theory and Practice)

Full Text Available
Score-matching representative approach for big data analysis with generalized linear models

https://doi.org/10.1214/21-EJS1965

Li, Keren; Yang, Jie (January 2022, Electronic Journal of Statistics)

Full Text Available
Affine-transformation invariant clustering models

https://doi.org/10.1186/s40488-020-00111-y

Huang, Hsin-Hsiung; Yang, Jie (December 2020, Journal of Statistical Distributions and Applications)
null (Ed.)
Abstract We develop a cluster process which is invariant with respect to unknown affine transformations of the feature space without knowing the number of clusters in advance. Specifically, our proposed method can identify clusters invariant under (I) orthogonal transformations, (II) scaling-coordinate orthogonal transformations, and (III) arbitrary nonsingular linear transformations corresponding to models I, II, and III, respectively and represent clusters with the proposed heatmap of the similarity matrix. The proposed Metropolis-Hasting algorithm leads to an irreducible and aperiodic Markov chain, which is also efficient at identifying clusters reasonably well for various applications. Both the synthetic and real data examples show that the proposed method could be widely applied in many fields, especially for finding the number of clusters and identifying clusters of samples of interest in aerial photography and genomic data.
more » « less
Full Text Available
Identifying zero-inflated distributions with a new R package iZID

https://doi.org/10.4310/CIS.2020.v20.n1.a2

Wang, Lei; Aldirawi, Hani; Yang, Jie (January 2020, Communications in Information and Systems)

Full Text Available

Search for: All records