skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PertInInt: An Integrative, Analytical Approach to Rapidly Uncover Cancer Driver Genes with Perturbed Interactions and Functionalities
A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. We introduce an integrative framework that identifies cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. We derive analytical calculations that enable us to avoid time-prohibitive permutation-based significance tests, making it computationally feasible to simultaneously consider multiple measures of protein site functionality. Our accompanying software, PertInInt, combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions, or small molecules with domain, evolutionary conservation, and gene-level mutation data. When applied to 10,037 tumor samples, PertInInt uncovers both known and newly predicted cancer genes, while additionally revealing what types of interactions or other functionalities are disrupted. PertInInt’s analysis demonstrates that somatic mutations are frequently enriched in interaction sites and domains and implicates interaction perturbation as a pervasive cancer-driving event.  more » « less
Award ID(s):
2006125
PAR ID:
10220034
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Cell systems
Volume:
11
ISSN:
2405-4712
Page Range / eLocation ID:
1 - 12
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Data-driven discovery of cancer driver genes, including tumor suppressor genes (TSGs) and oncogenes (OGs), is imperative for cancer prevention, diagnosis, and treatment. Although epigenetic alterations are important for tumor initiation and progression, most known driver genes were identified based on genetic alterations alone. Here, we developed an algorithm, DORGE (Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features), to identify TSGs and OGs by integrating comprehensive genetic and epigenetic data. DORGE identified histone modifications as strong predictors for TSGs, and it found missense mutations, super enhancers, and methylation differences as strong predictors for OGs. We extensively validated DORGE-predicted cancer driver genes using independent functional genomics data. We also found that DORGE-predicted dual-functional genes (both TSGs and OGs) are enriched at hubs in protein-protein interaction and drug-gene networks. Overall, our study has deepened the understanding of epigenetic mechanisms in tumorigenesis and revealed previously undetected cancer driver genes. 
    more » « less
  2. Replication Protein A (RPA) is single-strand DNA binding protein that plays a key role in the replication and repair of DNA. RPA is a heterotrimer made of 3 subunits – RPA1, RPA2, and RPA3. Germline pathogenic variants affectingRPA1were recently described in patients with Telomere Biology Disorders (TBD), also known as dyskeratosis congenita or short telomere syndrome. Premature telomere shortening is a hallmark of TBD and results in bone marrow failure and predisposition to hematologic malignancies. Building on the finding that somatic mutations in RPA subunit genes occur in ~1% of cancers, we hypothesized that germline RPA alterations might be enriched in human cancers. Because germlineRPA1mutations are linked to early onset TBD with predisposition to myelodysplastic syndromes, we interrogated pediatric cancer cohorts to define the prevalence and spectrum of rare/novel and putative damaging germlineRPA1,RPA2, andRPA3variants. In this study of 5,993 children with cancer, 75 (1.25%) harbored heterozygous rare (non-cancer population allele frequency (AF) < 0.1%) variants in the RPA heterotrimer genes, of which 51 cases (0.85%) had ultra-rare (AF < 0.005%) or novel variants. Compared with Genome Aggregation Database (gnomAD) non-cancer controls, there was significant enrichment of ultra-rare and novelRPA1, but notRPA2orRPA3, germline variants in our cohort (adjusted p-value < 0.05). Taken together, these findings suggest that germline putative damaging variants affectingRPA1are found in excess in children with cancer, warranting further investigation into the functional role of these variants in oncogenesis. 
    more » « less
  3. Cancer is a complex disease associated with abnormal DNA mutations. Not all tumors are cancerous and not all cancers are the same. Correct cancer type diagnosis can indicate the most effective drug therapy and increase survival rate. At the molecular level, it has been shown that cancer type classification can be carried out from the analysis of somatic point mutation. However, the high dimensionality and sparsity of genomic mutation data, coupled with its small sample size has been a hindrance in accurate classification of cancer. We address these problems by introducing a novel classification method called mClass that accounts for the sparsity of the data. mClass is a feature selection method that ranks genes based on their similarity across samples and employs their normalized mutual information to determine the set of genes that provide optimal classification accuracy. Experimental results on TCGA datasets show that mClass significantly improves testing accuracy compared to DeepGene, which is the state-of-the-art in cancer-type classification based on somatic mutation data. In addition, when compared with other cancer gene prediction tools, the set of genes selected by mClass contains the highest number of genes in top 100 genes listed in the Cancer Gene Census. mClass is available at https://github.com/mdahasan/mClass. 
    more » « less
  4. The rate and spectrum of somatic mutations can diverge from that of germline mutations. This is because somatic tissues experience different mutagenic processes than germline tissues. Here, we use nanorate sequencing (NanoSeq) to identify somatic mutations in Arabidopsis shoots with high sensitivity. We report a somatic mutation rate of 3.6x10^-8 mutations/bp, ~4-5x the germline mutation rate. Somatic mutations displayed elevated signatures consistent with oxidative damage, UV damage, and transcription-coupled nucleotide excision repair. Both somatic and germline mutations were enriched in transposable elements and depleted in genes, but this depletion was greater in germline mutations. Somatic mutation rate correlated with proximity to the centromere, DNA methylation, chromatin accessibility, and gene/TE content, properties which were also largely true of germline mutations. We note DNA methylation and chromatin accessibility have different predicted effects on mutation rate for genic and non-genic regions; DNA methylation associates with a greater increase in mutation rate when in non-genic regions, and accessible chromatin associates with a lower mutation rate in non-genic regions but a higher mutation rate in genic regions. Together, these results characterize key differences and similarities in the genomic distribution of somatic and germline mutations. 
    more » « less
  5. Abstract The discovery of cancer driver mutations is a fundamental goal in cancer research. While many cancer driver mutations have been discovered in the protein-coding genome, research into potential cancer drivers in the non-coding regions showed limited success so far. Here, we present a novel comprehensive framework Dr.Nod for detection of non-coding cis-regulatory candidate driver mutations that are associated with dysregulated gene expression using tissue-matched enhancer-gene annotations. Applying the framework to data from over 1500 tumours across eight tissues revealed a 4.4-fold enrichment of candidate driver mutations in regulatory regions of known cancer driver genes. An overarching conclusion that emerges is that the non-coding driver mutations contribute to cancer by significantly altering transcription factor binding sites, leading to upregulation of tissue-matched oncogenes and down-regulation of tumour-suppressor genes. Interestingly, more than half of the detected cancer-promoting non-coding regulatory driver mutations are over 20 kb distant from the cancer-associated genes they regulate. Our results show the importance of tissue-matched enhancer-gene maps, functional impact of mutations, and complex background mutagenesis model for the prediction of non-coding regulatory drivers. In conclusion, our study demonstrates that non-coding mutations in enhancers play a previously underappreciated role in cancer and dysregulation of clinically relevant target genes. 
    more » « less