NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Leveraging independence in high-dimensional mixed linear regression

https://doi.org/10.1093/biomtc/ujae103

Wang, Ning; Deng, Kai; Mai, Qing; Zhang, Xin (September 2024, Biometrics)

ABSTRACT We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.
more » « less
Bayesian Regression Analysis of Skewed Tensor Responses

https://doi.org/10.1111/biom.13743

Lee, Inkoo; Sinha, Debajyoti; Mai, Qing; Zhang, Xin; Bandyopadhyay, Dipankar (August 2022, Biometrics)

Abstract Tensor regression analysis is finding vast emerging applications in a variety of clinical settings, including neuroimaging, genomics, and dental medicine. The motivation for this paper is a study of periodontal disease (PD) with an order-3 tensor response: multiple biomarkers measured at prespecified tooth–sites within each tooth, for each participant. A careful investigation would reveal considerable skewness in the responses, in addition to response missingness. To mitigate the shortcomings of existing analysis tools, we propose a new Bayesian tensor response regression method that facilitates interpretation of covariate effects on both marginal and joint distributions of highly skewed tensor responses, and accommodates missing-at-random responses under a closure property of our tensor model. Furthermore, we present a prudent evaluation of the overall covariate effects while identifying their possible variations on only a sparse subset of the tensor components. Our method promises Markov chain Monte Carlo (MCMC) tools that are readily implementable. We illustrate substantial advantages of our proposal over existing methods via simulation studies and application to a real data set derived from a clinical study of PD. The R package BSTN available in GitHub implements our model.
more » « less
Tensor envelope mixture model for simultaneous clustering and multiway dimension reduction

https://doi.org/10.1111/biom.13486

Deng, Kai; Zhang, Xin (May 2021, Biometrics)

Abstract In the form of multidimensional arrays, tensor data have become increasingly prevalent in modern scientific studies and biomedical applications such as computational biology, brain imaging analysis, and process monitoring system. These data are intrinsically heterogeneous with complex dependencies and structure. Therefore, ad‐hoc dimension reduction methods on tensor data may lack statistical efficiency and can obscure essential findings. Model‐based clustering is a cornerstone of multivariate statistics and unsupervised learning; however, existing methods and algorithms are not designed for tensor‐variate samples. In this article, we propose a tensor envelope mixture model (TEMM) for simultaneous clustering and multiway dimension reduction of tensor data. TEMM incorporates tensor‐structure‐preserving dimension reduction into mixture modeling and drastically reduces the number of free parameters and estimative variability. An expectation‐maximization‐type algorithm is developed to obtain likelihood‐based estimators of the cluster means and covariances, which are jointly parameterized and constrained onto a series of lower dimensional subspaces known as the tensor envelopes. We demonstrate the encouraging empirical performance of the proposed method in extensive simulation studies and a real data application in comparison with existing vector and tensor clustering methods.
more » « less
Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction

https://doi.org/10.5705/ss.202022.0112

Mai, Qing; Shao, Xiaofeng; Wang, Runmin; Zhang, Xin (January 2025, Statistica Sinica)

Free, publicly-accessible full text available January 1, 2026
The Tucker Low-Rank Classification Model for Tensor Data

https://doi.org/10.5705/ss.202022.0007

Li, Junge; Mai, Qing; Zhang, Xin (January 2025, Statistica Sinica)

Free, publicly-accessible full text available January 1, 2026
Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model

Wang, N; Zhang, X; Mai, Q (July 2024, Journal of machine learning research)

The expectation-maximization (EM) algorithm and its variants are widely used in statistics. In high-dimensional mixture linear regression, the model is assumed to be a finite mixture of linear regression and the number of predictors is much larger than the sample size. The standard EM algorithm, which attempts to find the maximum likelihood estimator, becomes infeasible for such model. We devise a group lasso penalized EM algorithm and study its statistical properties. Existing theoretical results of regularized EM algorithms often rely on dividing the sample into many independent batches and employing a fresh batch of sample in each iteration of the algorithm. Our algorithm and theoretical analysis do not require sample-splitting, and can be extended to multivariate response cases. The proposed methods also have encouraging performances in numerical studies.
more » « less
Full Text Available
Robust and covariance-assisted tensor response regression

https://doi.org/10.4310/SII.2024.v17.n2.a10

Wang, Ning; Zhang, Xin (February 2024, Statistics and Its Interface)

Full Text Available
Subspace Estimation with Automatic Dimension and Variable Selection in Sufficient Dimension Reduction

https://doi.org/10.1080/01621459.2022.2118601

Zeng, Jing; Mai, Qing; Zhang, Xin (January 2024, Journal of the American Statistical Association)

Full Text Available
Parsimonious Tensor Discriminant Analysis

https://doi.org/10.5705/ss.202020.0496

Wang, Ning; Wang, Wenjing; Zhang, Xin (January 2024, Statistica Sinica)

Full Text Available
Generalized Liquid Association Analysis for Multimodal Data Integration

https://doi.org/10.1080/01621459.2021.2024437

Li, Lexin; Zeng, Jing; Zhang, Xin (July 2023, Journal of the American Statistical Association)

Full Text Available

« Prev Next »

Search for: All records