NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Microbiome Subcommunity Learning with Logistic-Tree Normal Latent Dirichlet Allocation

https://doi.org/10.1111/biom.13772

LeBlanc, Patrick; Ma, Li (October 2022, Biometrics)

Abstract Mixed-membership (MM) models such as latent Dirichlet allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. These subcommunities are informative for understanding the biological interplay of microbes and for predicting health outcomes. However, microbiome compositions typically display substantial cross-sample heterogeneities in subcommunity compositions—that is, the variability in the proportions of microbes in shared subcommunities across samples—which is not accounted for in prior analyses. As a result, LDA can produce inference, which is highly sensitive to the specification of the number of subcommunities and often divides a single subcommunity into multiple artificial ones. To address this limitation, we incorporate the logistic-tree normal (LTN) model into LDA to form a new MM model. This model allows cross-sample variation in the composition of each subcommunity around some “centroid” composition that defines the subcommunity. Incorporation of auxiliary Pólya-Gamma variables enables a computationally efficient collapsed blocked Gibbs sampler to carry out Bayesian inference under this model. By accounting for such heterogeneity, our new model restores the robustness of the inference in the specification of the number of subcommunities and allows meaningful subcommunities to be identified.
more » « less
Spatial adaptation by Bayesian unsupervised trees.

Liu, Linxi; Ma, Li (December 2024, Proceedings of Machine Learning Research)
Agrawal, Shipra; Roth, Aaron (Ed.)
Tree-based methods are popular nonparametric tools for capturing spatial heterogeneity and making predictions in multivariate problems. In unsupervised learning, trees and their ensembles have also been applied to a wide range of statistical inference tasks, such as multi-resolution sketching of distributional variations, localization of high-density regions, and design of efficient data compression schemes. In this paper, we study the spatial adaptation property of Bayesian tree-based methods in the unsupervised setting, with a focus on the density estimation problem. We characterize spatial heterogeneity of the underlying density function by using anisotropic Besov spaces, region-wise anisotropic Besov spaces, and two novel function classes as their extensions. For two types of commonly used prior distributions on trees under the context of unsupervised learning—the optional P{ó}lya tree (Wong and Ma, 2010) and the Dirichlet prior (Lu et al., 2013)—we calculate posterior concentration rates when the density function exhibits different types of heterogeneity. In specific, we show that the posterior concentration rate for trees is near minimax over the anisotropic Besov space. The rate is adaptive in the sense that to achieve such a rate we do not need any prior knowledge of the parameters of the Besov space.
more » « less
Free, publicly-accessible full text available December 1, 2025
Spatial properties of Bayesian unsupervised trees

Liu, Linxi; Ma, Li (September 2024, Proceedings of Machine Learning Research)
Agrawal, Shipra; Roth, Aaron (Ed.)
Tree-based methods are popular nonparametric tools for capturing spatial heterogeneity and making predictions in multivariate problems. In unsupervised learning, trees and their ensembles have also been applied to a wide range of statistical inference tasks, such as multi-resolution sketching of distributional variations, localization of high-density regions, and design of efficient data compression schemes. In this paper, we study the spatial adaptation property of Bayesian tree-based methods in the unsupervised setting, with a focus on the density estimation problem. We characterize spatial heterogeneity of the underlying density function by using anisotropic Besov spaces, region-wise anisotropic Besov spaces, and two novel function classes as their extensions. For two types of commonly used prior distributions on trees under the context of unsupervised learning—the optional P{ó}lya tree (Wong and Ma, 2010) and the Dirichlet prior (Lu et al., 2013)—we calculate posterior concentration rates when the density function exhibits different types of heterogeneity. In specific, we show that the posterior concentration rate for trees is near minimax over the anisotropic Besov space. The rate is adaptive in the sense that to achieve such a rate we do not need any prior knowledge of the parameters of the Besov space.
more » « less
Full Text Available
Spatial properties of Bayesian unsupervised trees

Liu, Linxi; Ma, Li (July 2024, Proceedings of Machine Learning Research)
Agrawal, Shipra; Roth, Aaron (Ed.)
Full Text Available
Coarsened Mixtures of Hierarchical Skew Normal Kernels for Flow and Mass Cytometry Analyses

https://doi.org/10.1214/22-BA1356

Gorsky, Shai; Chan, Cliburn; Ma, Li (June 2024, Bayesian Analysis)

Full Text Available
Efficient in-situ image and video compression through probabilistic image representation

https://doi.org/10.1016/j.sigpro.2023.109268

Liu, Rongjie; Li, Meng; Ma, Li (February 2024, Signal Processing)

Full Text Available
Hidden Markov Pólya Trees for High-Dimensional Distributions

https://doi.org/10.1080/01621459.2022.2105223

Awaya, Naoki; Ma, Li (January 2024, Journal of the American Statistical Association)

Full Text Available
Learning Asymmetric and Local Features in Multi-Dimensional Data Through Wavelets With Recursive Partitioning

https://doi.org/10.1109/TPAMI.2021.3110403

Li, Meng; Ma, Li (November 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence)

Full Text Available
Dirichlet-tree multinomial mixtures for clustering microbiome compositions

https://doi.org/10.1214/21-AOAS1552

Mao, Jialiang; Ma, Li (September 2022, The Annals of Applied Statistics)

Full Text Available
Rejoinder: ‘Multi-scale Fisher’s independence test for multivariate dependence’

https://doi.org/10.1093/biomet/asac034

Gorsky, S; Ma, L (August 2022, Biometrika)

Full Text Available

« Prev Next »

Search for: All records