NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Inverse moment methods for sufficient forecasting using high-dimensional predictors

https://doi.org/10.1093/biomet/asab037

Luo, Wei; Xue, Lingzhou; Yao, Jiawei; Yu, Xiufan (June 2021, Biometrika)

Summary We consider forecasting a single time series using a large number of predictors in the presence of a possible nonlinear forecast function. Assuming that the predictors affect the response through the latent factors, we propose to first conduct factor analysis and then apply sufficient dimension reduction on the estimated factors to derive the reduced data for subsequent forecasting. Using directional regression and the inverse third-moment method in the stage of sufficient dimension reduction, the proposed methods can capture the nonmonotone effect of factors on the response. We also allow a diverging number of factors and only impose general regularity conditions on the distribution of factors, avoiding the undesired time reversibility of the factors by the latter. These make the proposed methods fundamentally more applicable than the sufficient forecasting method of Fan et al. (2017). The proposed methods are demonstrated both in simulation studies and an empirical study of forecasting monthly macroeconomic data from 1959 to 2016. Also, our theory contributes to the literature of sufficient dimension reduction, as it includes an invariance result, a path to perform sufficient dimension reduction under the high-dimensional setting without assuming sparsity, and the corresponding order-determination procedure.
more » « less
Full Text Available
Compositional knockoff filter for high‐dimensional regression analysis of microbiome data

https://doi.org/10.1111/biom.13336

Srinivasan, Arun; Xue, Lingzhou; Zhan, Xiang (July 2020, Biometrics)

Abstract A critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine‐mapping of the microbiome, we propose a two‐step compositional knockoff filter to provide the effective finite‐sample false discovery rate (FDR) control in high‐dimensional linear log‐contrast regression analysis of microbiome compositional data. In the first step, we propose a new compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum‐to‐zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high‐dimensional microbial taxa as related to the response under a prespecified FDR threshold. We study the theoretical properties of the proposed two‐step procedure, including both sure screening and effective false discovery control. We demonstrate these properties in numerical simulation studies to compare our methods to some existing ones and show power gain of the new method while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease data set to identify microbial taxa that influence host gene expressions.
more » « less
A latent variable mixture model for composition-on-composition regression with application to chemical recycling

https://doi.org/10.1214/24-AOAS1935

Rios, Nicholas; Xue, Lingzhou; Zhan, Xiang (December 2024, The Annals of Applied Statistics)

Full Text Available
Model-Based Co-Clustering in Customer Targeting Utilizing Large-Scale Online Product Rating Networks

https://doi.org/10.1080/07350015.2024.2395423

Chen, Qian; Agarwal, Amal; Fong, Duncan_K H; DeSarbo, Wayne S; Xue, Lingzhou (October 2024, Journal of Business & Economic Statistics)

Full Text Available
Robust High-Dimensional Regression with Coefficient Thresholding and Its Application to Imaging Data Analysis

https://doi.org/10.1080/01621459.2022.2142590

Liu, Bingyuan; Zhang, Qi; Xue, Lingzhou; Song, Peter X-K; Kang, Jian (January 2024, Journal of the American Statistical Association)

Full Text Available
Fisher’s Combined Probability Test for High-Dimensional Covariance Matrices

https://doi.org/10.1080/01621459.2022.2126781

Yu, Xiufan; Li, Danning; Xue, Lingzhou (January 2024, Journal of the American Statistical Association)

Full Text Available
Robust Shape Matrix Estimation for High-Dimensional Compositional Data with Application to Microbial Inter-Taxa Analysis

https://doi.org/10.5705/ss.202021.0147

Li, Danning; Srinivasan, Arun; Xue, Lingzhou; Zhan, Xiang (January 2024, Statistica Sinica)

Full Text Available
Power-Enhanced Simultaneous Test of High-Dimensional Mean Vectors and Covariance Matrices with Application to Gene-Set Testing

https://doi.org/10.1080/01621459.2022.2061354

Yu, Xiufan; Li, Danning; Xue, Lingzhou; Li, Runze (October 2023, Journal of the American Statistical Association)

Full Text Available
Robust Covariance Matrix Estimation for High-Dimensional Compositional Data with Application to Sales Data Analysis

https://doi.org/10.1080/07350015.2022.2106990

Li, Danning; Srinivasan, Arun; Chen, Qian; Xue, Lingzhou (October 2023, Journal of Business & Economic Statistics)

Full Text Available
A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis

https://doi.org/10.1287/ijoo.2021.0064

Wang, Zhongruo; Liu, Bingyuan; Chen, Shixiang; Ma, Shiqian; Xue, Lingzhou; Zhao, Hongyu (April 2022, INFORMS Journal on Optimization)

Spectral clustering is one of the fundamental unsupervised learning methods and is widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering, and it improves the interpretability of the model. One widely adopted model for SSC in the literature is an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth objective using certain smoothing techniques. Therefore, they were not targeting solving the original formulation of SSC. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation without twisting the model. We also extend the algorithm to solve multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via clustering of several data sets, including University of California Irvine and single-cell RNA sequencing data sets.
more » « less
Full Text Available

« Prev Next »

Search for: All records