skip to main content


Title: A review of multivariate distributions for count data derived from the Poisson distribution: A review of multivariate distributions for count data
NSF-PAR ID:
10053814
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Wiley Interdisciplinary Reviews: Computational Statistics
Volume:
9
Issue:
3
ISSN:
1939-5108
Page Range / eLocation ID:
e1398
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The Dirichlet-multinomial (DM) distribution plays a fundamental role in modern statistical methodology development and application. Recently, the DM distribution and its variants have been used extensively to model multivariate count data generated by high-throughput sequencing technology in omics research due to its ability to accommodate the compositional structure of the data as well as overdispersion. A major limitation of the DM distribution is that it is unable to handle excess zeros typically found in practice which may bias inference. To fill this gap, we propose a novel Bayesian zero-inflated DM model for multivariate compositional count data with excess zeros. We then extend our approach to regression settings and embed sparsity-inducing priors to perform variable selection for high-dimensional covariate spaces. Throughout, modeling decisions are made to boost scalability without sacrificing interpretability or imposing limiting assumptions. Extensive simulations and an application to a human gut microbiome dataset are presented to compare the performance of the proposed method to existing approaches. We provide an accompanying R package with a user-friendly vignette to apply our method to other datasets.

     
    more » « less