NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Unsupervised Tree Boosting for Learning Probability Distributions

Awaya, Naoki; Ma, Li (July 2024, Journal of Machine Learning Research)

Full Text Available
Coarsened Mixtures of Hierarchical Skew Normal Kernels for Flow and Mass Cytometry Analyses

https://doi.org/10.1214/22-BA1356

Gorsky, Shai; Chan, Cliburn; Ma, Li (June 2024, Bayesian Analysis)

Full Text Available
Hidden Markov Pólya Trees for High-Dimensional Distributions

https://doi.org/10.1080/01621459.2022.2105223

Awaya, Naoki; Ma, Li (January 2024, Journal of the American Statistical Association)

Full Text Available
A Tree Perspective on Stick-Breaking Models in Covariate-Dependent Mixtures

https://doi.org/10.1214/24-BA1462

Horiguchi, Akira; Chan, Cliburn; Ma, Li (January 2024, Bayesian Analysis)

Full Text Available
Learning Asymmetric and Local Features in Multi-Dimensional Data Through Wavelets With Recursive Partitioning

https://doi.org/10.1109/TPAMI.2021.3110403

Li, Meng; Ma, Li (November 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence)

Full Text Available
Rejoinder: ‘Multi-scale Fisher’s independence test for multivariate dependence’

https://doi.org/10.1093/biomet/asac034

Gorsky, S; Ma, L (August 2022, Biometrika)

Full Text Available
Multi-scale Fisher’s independence test for multivariate dependence

https://doi.org/10.1093/biomet/asac013

Gorsky, S; Ma, L (February 2022, Biometrika)

Summary Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of $$2\times 2$$ contingency tables constructed through sequential coarse-to-fine discretization of the sample , transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment.
more » « less
Full Text Available

Search for: All records