skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Estimation of multiple networks with common structures in heterogeneous subgroups
Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.  more » « less
Award ID(s):
2209685
PAR ID:
10512834
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Journal of Multivariate Analysis
Volume:
202
Issue:
C
ISSN:
0047-259X
Page Range / eLocation ID:
105298
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Heterogeneity is a hallmark of cancer, diabetes, cardiovascular diseases, and many other complex diseases. This study has been partly motivated by the unsupervised heterogeneity analysis for complex diseases based on molecular and imaging data, for which, network‐based analysis, by accommodating the interconnections among variables, can be more informative than that limited to mean, variance, and other simple distributional properties. In the literature, there has been very limited research on network‐based heterogeneity analysis, and a common limitation shared by the existing techniques is that the number of subgroups needs to be specified a priori or in an ad hoc manner. In this article, we develop a penalized fusion approach for heterogeneity analysis based on the Gaussian graphical model. It applies penalization to the mean and precision matrix parameters to generate regularized and interpretable estimates. More importantly, a fusion penalty is imposed to “automatedly” determine the number of subgroups and generate more concise, reliable, and interpretable estimation. Consistency properties are rigorously established, and an effective computational algorithm is developed. The heterogeneity analysis of non‐small‐cell lung cancer based on single‐cell gene expression data of the Wnt pathway and that of lung adenocarcinoma based on histopathological imaging data not only demonstrate the practical applicability of the proposed approach but also lead to interesting new findings. 
    more » « less
  2. Wren, Jonathan (Ed.)
    Abstract Summary Heterogeneity is a hallmark of many complex human diseases, and unsupervised heterogeneity analysis has been extensively conducted using high-throughput molecular measurements and histopathological imaging features. ‘Classic’ heterogeneity analysis has been based on simple statistics such as mean, variance and correlation. Network-based analysis takes interconnections as well as individual variable properties into consideration and can be more informative. Several Gaussian graphical model (GGM)-based heterogeneity analysis techniques have been developed, but friendly and portable software is still lacking. To facilitate more extensive usage, we develop the R package HeteroGGM, which conducts GGM-based heterogeneity analysis using the advanced penaliztaion techniques, can provide informative summary and graphical presentation, and is efficient and friendly. Availabilityand implementation The package is available at https://CRAN.R-project.org/package=HeteroGGM. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. null (Ed.)
    In the study of gene expression data, network analysis has played a uniquely important role. To accommodate the high dimensionality and low sample size and generate interpretable results, regularized estimation is usually conducted in the construction of gene expression Gaussian Graphical Models (GGM). Here we use GeO‐GGM to represent gene‐expression‐only GGM. Gene expressions are regulated by regulators. gene‐expression‐regulator GGMs (GeR‐GGMs), which accommodate gene expressions as well as their regulators, have been constructed accordingly. In practical data analysis, with a “lack of information” caused by the large number of model parameters, limited sample size, and weak signals, the construction of both GeO‐GGMs and GeR‐GGMs is often unsatisfactory. In this article, we recognize that with the regulation between gene expressions and regulators, the sparsity structures of a GeO‐GGM and its GeR‐GGM counterpart can satisfy a hierarchy. Accordingly, we propose a joint estimation which reinforces the hierarchical structure and use the construction of a GeO‐GGM to assist that of its GeR‐GGM counterpart and vice versa. Consistency properties are rigorously established, and an effective computational algorithm is developed. In simulation, the assisted construction outperforms the separation construction of GeO‐GGM and GeR‐GGM. Two The Cancer Genome Atlas data sets are analyzed, leading to findings different from the direct competitors. 
    more » « less
  4. The manuscript considers multivariate functional data analysis with a known graphical model among the functional variables representing their conditional relationships (e.g., brain region-level fMRI data with a prespecified connectivity graph among brain regions). Functional Gaussian graphical models (GGM) used for analyzing multivariate functional data customarily estimate an unknown graphical model, and cannot preserve knowledge of a given graph. We propose a method for multivariate functional analysis that exactly conforms to a given inter-variable graph. We first show the equivalence between partially separable functional GGM and graphical Gaussian processes (GP), proposed recently for constructing optimal multivariate covariance functions that retain a given graphical model. The theoretical connection helps to design a new algorithm that leverages Dempster’s covariance selection for obtaining the maximum likelihood estimate of the covariance function for multivariate functional data under graphical constraints. We also show that the finite term truncation of functional GGM basis expansion used in practice is equivalent to a low-rank graphical GP, which is known to oversmooth marginal distributions. To remedy this, we extend our algorithm to better preserve marginal distributions while respecting the graph and retaining computational scalability. The benefits of the proposed algorithms are illustrated using empirical experiments and a neuroimaging application. 
    more » « less
  5. We consider the problem of estimating differences in two Gaussian graphical models (GGMs) which are known to have similar structure. The GGM structure is encoded in its precision (inverse covariance) matrix. In many applications one is interested in estimating the difference in two precision matrices to characterize underlying changes in conditional dependencies of two sets of data. Existing methods for differential graph estimation are based on single-attribute models where one associates a scalar random variable with each node. In multi-attribute graphical models, each node represents a random vector. In this paper, we analyze a group lasso penalized D-trace loss function approach for differential graph learning from multi-attribute data. An alternating direction method of multipliers (ADMM) algorithm is presented to optimize the objective function. Theoretical analysis establishing consistency in support recovery and estimation in high-dimensional settings is provided. We illustrate our approach using a numerical example where the multi-attribute approach is shown to outperform a single-attribute approach. 
    more » « less