skip to main content


Title: PertInInt: An Integrative, Analytical Approach to Rapidly Uncover Cancer Driver Genes with Perturbed Interactions and Functionalities
A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. We introduce an integrative framework that identifies cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. We derive analytical calculations that enable us to avoid time-prohibitive permutation-based significance tests, making it computationally feasible to simultaneously consider multiple measures of protein site functionality. Our accompanying software, PertInInt, combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions, or small molecules with domain, evolutionary conservation, and gene-level mutation data. When applied to 10,037 tumor samples, PertInInt uncovers both known and newly predicted cancer genes, while additionally revealing what types of interactions or other functionalities are disrupted. PertInInt’s analysis demonstrates that somatic mutations are frequently enriched in interaction sites and domains and implicates interaction perturbation as a pervasive cancer-driving event.  more » « less
Award ID(s):
2006125
NSF-PAR ID:
10220034
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Cell systems
Volume:
11
ISSN:
2405-4712
Page Range / eLocation ID:
1 - 12
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Replication Protein A (RPA) is single-strand DNA binding protein that plays a key role in the replication and repair of DNA. RPA is a heterotrimer made of 3 subunits – RPA1, RPA2, and RPA3. Germline pathogenic variants affectingRPA1were recently described in patients with Telomere Biology Disorders (TBD), also known as dyskeratosis congenita or short telomere syndrome. Premature telomere shortening is a hallmark of TBD and results in bone marrow failure and predisposition to hematologic malignancies. Building on the finding that somatic mutations in RPA subunit genes occur in ~1% of cancers, we hypothesized that germline RPA alterations might be enriched in human cancers. Because germlineRPA1mutations are linked to early onset TBD with predisposition to myelodysplastic syndromes, we interrogated pediatric cancer cohorts to define the prevalence and spectrum of rare/novel and putative damaging germlineRPA1,RPA2, andRPA3variants. In this study of 5,993 children with cancer, 75 (1.25%) harbored heterozygous rare (non-cancer population allele frequency (AF) < 0.1%) variants in the RPA heterotrimer genes, of which 51 cases (0.85%) had ultra-rare (AF < 0.005%) or novel variants. Compared with Genome Aggregation Database (gnomAD) non-cancer controls, there was significant enrichment of ultra-rare and novelRPA1, but notRPA2orRPA3, germline variants in our cohort (adjusted p-value < 0.05). Taken together, these findings suggest that germline putative damaging variants affectingRPA1are found in excess in children with cancer, warranting further investigation into the functional role of these variants in oncogenesis.

     
    more » « less
  2. Abstract Motivation

    The analysis of high-dimensional ‘omics data is often informed by the use of biological interaction networks. For example, protein–protein interaction networks have been used to analyze gene expression data, to prioritize germline variants, and to identify somatic driver mutations in cancer. In these and other applications, the underlying computational problem is to identify altered subnetworks containing genes that are both highly altered in an ‘omics dataset and are topologically close (e.g. connected) on an interaction network.

    Results

    We introduce Hierarchical HotNet, an algorithm that finds a hierarchy of altered subnetworks. Hierarchical HotNet assesses the statistical significance of the resulting subnetworks over a range of biological scales and explicitly controls for ascertainment bias in the network. We evaluate the performance of Hierarchical HotNet and several other algorithms that identify altered subnetworks on the problem of predicting cancer genes and significantly mutated subnetworks. On somatic mutation data from The Cancer Genome Atlas, Hierarchical HotNet outperforms other methods and identifies significantly mutated subnetworks containing both well-known cancer genes and candidate cancer genes that are rarely mutated in the cohort. Hierarchical HotNet is a robust algorithm for identifying altered subnetworks across different ‘omics datasets.

    Availability and implementation

    http://github.com/raphael-group/hierarchical-hotnet.

    Supplementary information

    Supplementary material are available at Bioinformatics online.

     
    more » « less
  3. null (Ed.)
    Data-driven discovery of cancer driver genes, including tumor suppressor genes (TSGs) and oncogenes (OGs), is imperative for cancer prevention, diagnosis, and treatment. Although epigenetic alterations are important for tumor initiation and progression, most known driver genes were identified based on genetic alterations alone. Here, we developed an algorithm, DORGE (Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features), to identify TSGs and OGs by integrating comprehensive genetic and epigenetic data. DORGE identified histone modifications as strong predictors for TSGs, and it found missense mutations, super enhancers, and methylation differences as strong predictors for OGs. We extensively validated DORGE-predicted cancer driver genes using independent functional genomics data. We also found that DORGE-predicted dual-functional genes (both TSGs and OGs) are enriched at hubs in protein-protein interaction and drug-gene networks. Overall, our study has deepened the understanding of epigenetic mechanisms in tumorigenesis and revealed previously undetected cancer driver genes. 
    more » « less
  4. Abstract

    AGAMOUS-Like 18 (AGL18) is a MADS domain transcription factor (TF) that is structurally related to AGL15. Here we show that, like AGL15, AGL18 can promote somatic embryogenesis (SE) when ectopically expressed in Arabidopsis (Arabidopsis thaliana). Based on loss-of-function mutants, AGL15 and AGL18 have redundant functions in developmental processes such as SE. To understand the nature of this redundancy, we undertook a number of studies to look at the interaction between these factors. We studied the genome-wide direct targets of AGL18 to characterize its roles at the molecular level using chromatin immunoprecipitation (ChIP)-SEQ combined with RNA-SEQ. The results demonstrated that AGL18 binds to thousands of sites in the genome. Comparison of ChIP-SEQ data for AGL15 and AGL18 revealed substantial numbers of genes bound by both AGL15 and AGL18, but there were also differences. Gene ontology analysis revealed that target genes were enriched for seed, embryo, and reproductive development as well as hormone and stress responses. The results also demonstrated that AGL15 and AGL18 interact in a complex regulatory loop, where AGL15 inhibited transcript accumulation of AGL18, while AGL18 increased AGL15 transcript accumulation. Co-immunoprecipitation revealed an interaction between AGL18 and AGL15 in somatic embryo tissue. The binding and expression analyses revealed a complex crosstalk and interactions among embryo TFs and their target genes. In addition, our study also revealed that phosphorylation of AGL18 and AGL15 was crucial for the promotion of SE.

     
    more » « less
  5. Directional association measured by functional dependency can answer important questions on relationships between variables, for example, in discovery of molecular interactions in biological systems. However, when one has no prior information about the functional form of a directional association, there is not a widely established statistical procedure to detect such an association. To address this issue, here we introduce an exact functional test for directional association by examining the strength of functional dependency. It is effective in promoting functional patterns by reducing statistical power on dependent non-functional patterns. We designed an algorithm to carry out the test using a fast branch-and-bound strategy, which achieved a substantial speedup over brute-force enumeration. On data from an epidemiological study of liver cancer, the test identified the hepatitis status of a subject as the most influential risk factor among others for the cancer phenotype. On human lung cancer transcriptome data, the test selected 1068 transcription start sites of putative noncoding RNAs directionally associated with the presence or absence of lung cancer, stronger than 95 percent transcription start sites of 694 curated cancer genes. These predictions include non-monotonic interaction patterns, to which other routine tests were insensitive. Complementing symmetric (non-directional) association methods such as Fisher’s exact test, the exact functional test is a unique exact statistical test for evaluating evidence for causal relationships. 
    more » « less