Abstract Factor analysis is a widely used statistical tool in many scientific disciplines, such as psychology, economics, and sociology. As observations linked by networks become increasingly common, incorporating network structures into factor analysis remains an open problem. In this paper, we focus on high-dimensional factor analysis involving network-connected observations, and propose a generalized factor model with latent factors that account for both the network structure and the dependence structure among high-dimensional variables. These latent factors can be shared by the high-dimensional variables and the network, or exclusively applied to either of them. We develop a computationally efficient estimation procedure and establish asymptotic inferential theories. Notably, we show that by borrowing information from the network, the proposed estimator of the factor loading matrix achieves optimal asymptotic variance under much milder identifiability constraints than existing literature. Furthermore, we develop a hypothesis testing procedure to tackle the challenge of discerning the shared and individual latent factors’ structure. The finite sample performance of the proposed method is demonstrated through simulation studies and a real-world dataset involving a statistician co-authorship network.
more »
« less
Tensor factor adjustment for image classification with pervasive noises
Abstract This paper studies a tensor factor model that augments samples from multiple classes. The nuisance common patterns shared across classes are characterised by pervasive noises, and the patterns that distinguish different classes are represented by class‐specific components. Additionally, the pervasive component is modelled by the production of a low‐rank tensor latent factor and several factor loading matrices. This augmented tensor factor model can be expanded to a series of matrix variate tensor factor models and estimated using principal component analysis. The ranks of latent factors are estimated using a modified eigen‐ratio method. The proposed estimators have fast convergence rates and enjoy the blessing of dimensionality. The proposed factor model is applied to address the challenge of overlapping issues in image classification through a factor adjustment procedure. The procedure is shown to be powerful through synthetic experiments and an application to COVID‐19 pneumonia diagnosis from frontal chest X‐ray images.
more »
« less
- PAR ID:
- 10523131
- Publisher / Repository:
- Wiley Online Library
- Date Published:
- Journal Name:
- Stat
- Volume:
- 13
- Issue:
- 3
- ISSN:
- 2049-1573
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper studies the prediction task of tensor-on-tensor regression in which both covariates and responses are multi-dimensional arrays (a.k.a., tensors) across time with arbitrary tensor order and data dimension. Existing methods either focused on linear models without accounting for possibly nonlinear relationships between covariates and responses, or directly employed black-box deep learning algorithms that failed to utilize the inherent tensor structure. In this work, we propose a Factor Augmented Tensor-on-Tensor Neural Network (FATTNN) that integrates tensor factor models into deep neural networks. We begin with summarizing and extracting useful predictive information (represented by the ``factor tensor'') from the complex structured tensor covariates, and then proceed with the prediction task using the estimated factor tensor as input of a temporal convolutional neural network. The proposed methods effectively handle nonlinearity between complex data structures, and improve over traditional statistical models and conventional deep learning approaches in both prediction accuracy and computational cost. By leveraging tensor factor models, our proposed methods exploit the underlying latent factor structure to enhance the prediction, and in the meantime, drastically reduce the data dimensionality that speeds up the computation. The empirical performances of our proposed methods are demonstrated via simulation studies and real-world applications to three public datasets. Numerical results show that our proposed algorithms achieve substantial increases in prediction accuracy and significant reductions in computational time compared to benchmark methods.more » « less
-
Abstract Observations in various applications are frequently represented as a time series of multidimensional arrays, called tensor time series, preserving the inherent multidimensional structure. In this paper, we present a factor model approach, in a form similar to tensor CANDECOMP/PARAFAC (CP) decomposition, to the analysis of high-dimensional dynamic tensor time series. As the loading vectors are uniquely defined but not necessarily orthogonal, it is significantly different from the existing tensor factor models based on Tucker-type tensor decomposition. The model structure allows for a set of uncorrelated one-dimensional latent dynamic factor processes, making it much more convenient to study the underlying dynamics of the time series. A new high-order projection estimator is proposed for such a factor model, utilizing the special structure and the idea of the higher order orthogonal iteration procedures commonly used in Tucker-type tensor factor model and general tensor CP decomposition procedures. Theoretical investigation provides statistical error bounds for the proposed methods, which shows the significant advantage of utilizing the special model structure. Simulation study is conducted to further demonstrate the finite sample properties of the estimators. Real data application is used to illustrate the model and its interpretations.more » « less
-
Summary We propose and investigate an additive regression model for symmetric positive-definite matrix-valued responses and multiple scalar predictors. The model exploits the Abelian group structure inherited from either of the log-Cholesky and log-Euclidean frameworks for symmetric positive-definite matrices and naturally extends to general Abelian Lie groups. The proposed additive model is shown to connect to an additive model on a tangent space. This connection not only entails an efficient algorithm to estimate the component functions, but also allows one to generalize the proposed additive model to general Riemannian manifolds. Optimal asymptotic convergence rates and normality of the estimated component functions are established, and numerical studies show that the proposed model enjoys good numerical performance, and is not subject to the curse of dimensionality when there are multiple predictors. The practical merits of the proposed model are demonstrated through an analysis of brain diffusion tensor imaging data.more » « less
-
Abstract We propose a combined model, which integrates the latent factor model and a sparse graphical model, for network data. It is noticed that neither a latent factor model nor a sparse graphical model alone may be sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represent the main trends (a.k.a., factors), and a sparse graphical component that captures the remaining ad‐hoc dependence. Model selection and parameter estimation are carried out simultaneously via a penalized likelihood approach. The convexity of the objective function allows us to develop an efficient algorithm, while the penalty terms push towards low‐dimensional latent components and a sparse graphical structure. The effectiveness of our model is demonstrated via simulation studies, and the model is also applied to four real datasets: Zachary's Karate club data, Kreb's U.S. political book dataset (http://www.orgnet.com), U.S. political blog dataset , and citation network of statisticians; showing meaningful performances in practical situations.more » « less
An official website of the United States government

