NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Graphical Model Inference with erroneously Measured Data

https://doi.org/10.1080/01621459.2023.2256503

Zheng, Lili; Allen, Genevera I (July 2024, Journal of the American Statistical Association)

Full Text Available
Interpretable Machine Learning for Discovery: Statistical Challenges and Opportunities

https://doi.org/10.1146/annurev-statistics-040120-030919

Allen, Genevera I; Gan, Luqin; Zheng, Lili (April 2024, Annual Review of Statistics and Its Application)

New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude byhighlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.
more » « less
Full Text Available
Optimal High-Order Tensor SVD via Tensor-Train Orthogonal Iteration

https://doi.org/10.1109/TIT.2022.3152733

Zhou, Yuchen; Zhang, Anru R.; Zheng, Lili; Wang, Yazhen (June 2022, IEEE Transactions on Information Theory)

Full Text Available
Context-dependent Networks in Multivariate Time Series: Models, Methods, and Risk Bounds in High Dimensions

Zheng, Lili; Raskutti, Garvesh; Willett, Rebecca; Mark, Benjamin (August 2021, Journal of machine learning research)

Full Text Available
Nonparanormal graph quilting with applications to calcium imaging

https://doi.org/10.1002/sta4.623

Chang, Andersen; Zheng, Lili; Dasarathy, Gautam; Allen, Genevera I. (September 2023, Stat)

Abstract Probabilistic graphical models have become an important unsupervised learning tool for detecting network structures for a variety of problems, including the estimation of functional neuronal connectivity from two‐photon calcium imaging data. However, in the context of calcium imaging, technological limitations only allow for partially overlapping layers of neurons in a brain region of interest to be jointly recorded. In this case, graph estimation for the full data requires inference for edge selection when many pairs of neurons have no simultaneous observations. This leads to the graph quilting problem, which seeks to estimate a graph in the presence of block‐missingness in the empirical covariance matrix. Solutions for the graph quilting problem have previously been studied for Gaussian graphical models; however, neural activity data from calcium imaging are often non‐Gaussian, thereby requiring a more flexible modelling approach. Thus, in our work, we study two approaches for nonparanormal graph quilting based on the Gaussian copula graphical model, namely, a maximum likelihood procedure and a low rank‐based framework. We provide theoretical guarantees on edge recovery for the former approach under similar conditions to those previously developed for the Gaussian setting, and we investigate the empirical performance of both methods using simulations as well as real data calcium imaging data. Our approaches yield more scientifically meaningful functional connectivity estimates compared to existing Gaussian graph quilting methods for this calcium imaging data set.
more » « less

Search for: All records