Sparse semiparametric canonical correlation analysis for data of mixed types

Yoon, Grace; Carroll, Raymond J; Gaynanova, Irina

doi:10.1093/biomet/asaa007

Citation Details

Sparse semiparametric canonical correlation analysis for data of mixed types

Summary Canonical correlation analysis investigates linear relationships between two sets of variables, but it often works poorly on modern datasets because of high dimensionality and mixed data types such as continuous, binary and zero-inflated. To overcome these challenges, we propose a semiparametric approach to sparse canonical correlation analysis based on the Gaussian copula. The main result of this paper is a truncated latent Gaussian copula model for data with excess zeros, which allows us to derive a rank-based estimator of the latent correlation matrix for mixed variable types without estimation of marginal transformation functions. The resulting canonical correlation analysis method works well in high-dimensional settings, as demonstrated via numerical studies, and when applied to the analysis of association between gene expression and microRNA data from breast cancer patients. more »

Award ID(s):: 1712943 1934904

PAR ID:: 10160912

Author(s) / Creator(s):: Yoon, Grace; Carroll, Raymond J; Gaynanova, Irina

Date Published:: 2020-04-15

Journal Name:: Biometrika

ISSN:: 0006-3444

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1093/biomet/asaa007

More Like this