skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Assisted estimation of gene expression graphical models
In the study of gene expression data, network analysis has played a uniquely important role. To accommodate the high dimensionality and low sample size and generate interpretable results, regularized estimation is usually conducted in the construction of gene expression Gaussian Graphical Models (GGM). Here we use GeO‐GGM to represent gene‐expression‐only GGM. Gene expressions are regulated by regulators. gene‐expression‐regulator GGMs (GeR‐GGMs), which accommodate gene expressions as well as their regulators, have been constructed accordingly. In practical data analysis, with a “lack of information” caused by the large number of model parameters, limited sample size, and weak signals, the construction of both GeO‐GGMs and GeR‐GGMs is often unsatisfactory. In this article, we recognize that with the regulation between gene expressions and regulators, the sparsity structures of a GeO‐GGM and its GeR‐GGM counterpart can satisfy a hierarchy. Accordingly, we propose a joint estimation which reinforces the hierarchical structure and use the construction of a GeO‐GGM to assist that of its GeR‐GGM counterpart and vice versa. Consistency properties are rigorously established, and an effective computational algorithm is developed. In simulation, the assisted construction outperforms the separation construction of GeO‐GGM and GeR‐GGM. Two The Cancer Genome Atlas data sets are analyzed, leading to findings different from the direct competitors.  more » « less
Award ID(s):
1916251
PAR ID:
10225366
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Genetic Epidemiology
ISSN:
0741-0395
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as “direct” and “indirect,” where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences. 
    more » « less
  2. null (Ed.)
    Abstract Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a ‘lack of information’ problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where vertical integration methods collectively analyze data on gene expressions as well as their regulators (such as mutations, DNA methylation and miRNAs). In this article, we conduct a selective review of vertical data integration methods for gene expression data. The reviewed methods cover both marginal and joint analysis and supervised and unsupervised analysis. The main goal is to provide a sketch of the vertical data integration paradigm without digging into too many technical details. We also briefly discuss potential pitfalls, directions for future developments and application notes. 
    more » « less
  3. We consider the problem of estimating differences in two multi-attribute Gaussian graphical models (GGMs) which are known to have similar structure, using a penalized D-trace loss function with non-convex penalties. The GGM structure is encoded in its precision (inverse covariance) matrix. Existing methods for multi-attribute differential graph estimation are based on a group lasso penalized loss function. In this paper, we consider a penalized D-trace loss function with non-convex [log-sum and smoothly clipped absolute deviation (SCAD)] penalties. Two proximal gradient descent methods are presented to optimize the objective function. Theoretical analysis establishing sufficient conditions for consistency in support recovery, convexity and estimation in high-dimensional settings is provided. We illustrate our approaches with numerical examples based on synthetic and real data. 
    more » « less
  4. We consider the problem of estimating differences in two multi-attribute Gaussian graphical models (GGMs) which are known to have similar structure, using a penalized D-trace loss function with nonconvex penalties. The GGM structure is encoded in its precision (inverse covariance) matrix. Existing methods for multi-attribute differential graph estimation are based on a group lasso penalized loss function. In this paper, we consider a penalized D-trace loss function with nonconvex (log-sum and smoothly clipped absolute deviation (SCAD)) penalties. Two proximal gradient descent methods are presented to optimize the objective function. Theoretical analysis establishing local consistency in support recovery, local convexity and estimation in high-dimensional settings is provided. We illustrate our approach with a numerical example. 
    more » « less
  5. We consider the problem of estimating differences in two Gaussian graphical models (GGMs) which are known to have similar structure. The GGM structure is encoded in its precision (inverse covariance) matrix. In many applications one is interested in estimating the difference in two precision matrices to characterize underlying changes in conditional dependencies of two sets of data. Existing methods for differential graph estimation are based on single-attribute (SA) models where one associates a scalar random variable with each node. In multi-attribute (MA) graphical models, each node represents a random vector. In this paper, we analyze a group lasso penalized D-trace loss function approach for differential graph learning from multi-attribute data. An alternating direction method of multipliers (ADMM) algorithm is presented to optimize the objective function. Theoretical analysis establishing consistency in support recovery and estimation in high-dimensional settings is provided. Numerical results based on synthetic as well as real data are presented. 
    more » « less