In the form of multidimensional arrays, tensor data have become increasingly prevalent in modern scientific studies and biomedical applications such as computational biology, brain imaging analysis, and process monitoring system. These data are intrinsically heterogeneous with complex dependencies and structure. Therefore, ad‐hoc dimension reduction methods on tensor data may lack statistical efficiency and can obscure essential findings. Model‐based clustering is a cornerstone of multivariate statistics and unsupervised learning; however, existing methods and algorithms are not designed for tensor‐variate samples. In this article, we propose a tensor envelope mixture model (TEMM) for simultaneous clustering and multiway dimension reduction of tensor data. TEMM incorporates tensor‐structure‐preserving dimension reduction into mixture modeling and drastically reduces the number of free parameters and estimative variability. An expectation‐maximization‐type algorithm is developed to obtain likelihood‐based estimators of the cluster means and covariances, which are jointly parameterized and constrained onto a series of lower dimensional subspaces known as the tensor envelopes. We demonstrate the encouraging empirical performance of the proposed method in extensive simulation studies and a real data application in comparison with existing vector and tensor clustering methods.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Abstract It is increasingly interesting to model the relationship between two sets of high-dimensional measurements with potentially high correlations. Canonical correlation analysis (CCA) is a classical tool that explores the dependency of two multivariate random variables and extracts canonical pairs of highly correlated linear combinations. Driven by applications in genomics, text mining, and imaging research, among others, many recent studies generalize CCA to high-dimensional settings. However, most of them either rely on strong assumptions on covariance matrices, or do not produce nested solutions. We propose a new sparse CCA (SCCA) method that recasts high-dimensional CCA as an iterative penalized least squares problem. Thanks to the new iterative penalized least squares formulation, our method directly estimates the sparse CCA directions with efficient algorithms. Therefore, in contrast to some existing methods, the new SCCA does not impose any sparsity assumptions on the covariance matrices. The proposed SCCA is also very flexible in the sense that it can be easily combined with properly chosen penalty functions to perform structured variable selection and incorporate prior information. Moreover, our proposal of SCCA produces nested solutions and thus provides great convenient in practice. Theoretical results show that SCCA can consistently estimate the true canonical pairs with an overwhelming probability in ultra-high dimensions. Numerical results also demonstrate the competitive performance of SCCA.
-
Summary Envelopes have been proposed in recent years as a nascent methodology for sufficient dimension reduction and efficient parameter estimation in multivariate linear models. We extend the classical definition of envelopes in Cook et al. (2010) to incorporate a nonlinear conditional mean function and a heteroscedastic error. Given any two random vectors ${X}\in\mathbb{R}^{p}$ and ${Y}\in\mathbb{R}^{r}$, we propose two new model-free envelopes, called the martingale difference divergence envelope and the central mean envelope, and study their relationships to the standard envelope in the context of response reduction in multivariate linear models. The martingale difference divergence envelope effectively captures the nonlinearity in the conditional mean without imposing any parametric structure or requiring any tuning in estimation. Heteroscedasticity, or nonconstant conditional covariance of ${Y}\mid{X}$, is further detected by the central mean envelope based on a slicing scheme for the data. We reveal the nested structure of different envelopes: (i) the central mean envelope contains the martingale difference divergence envelope, with equality when ${Y}\mid{X}$ has a constant conditional covariance; and (ii) the martingale difference divergence envelope contains the standard envelope, with equality when ${Y}\mid{X}$ has a linear conditional mean. We develop an estimation procedure that first obtains the martingale difference divergence envelope and then estimates the additional envelope components in the central mean envelope. We establish consistency in envelope estimation of the martingale difference divergence envelope and central mean envelope without stringent model assumptions. Simulations and real-data analysis demonstrate the advantages of the martingale difference divergence envelope and the central mean envelope over the standard envelope in dimension reduction.more » « less
-
Sufficient dimension reduction (SDR) is a very useful concept for exploratory analysis and data visualization in regression, especially when the number of covariates is large. Many SDR methods have been proposed for regression with a continuous response, where the central subspace (CS) is the target of estimation. Various conditions, such as the linearity condition and the constant covariance condition, are imposed so that these methods can estimate at least a portion of the CS. In this paper we study SDR for regression and discriminant analysis with categorical response. Motivated by the exploratory analysis and data visualization aspects of SDR, we propose a new geometric framework to reformulate the SDR problem in terms of manifold optimization and introduce a new concept called Maximum Separation Subspace (MASES). The MASES naturally preserves the “sufficiency” in SDR without imposing additional conditions on the predictor distribution, and directly inspires a semi-parametric estimator. Numerical studies show MASES exhibits superior performance as compared with competing SDR methods in specific settings.more » « less