skip to main content


Search for: All records

Award ID contains: 2042518

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    The discovery of cancer driver mutations is a fundamental goal in cancer research. While many cancer driver mutations have been discovered in the protein-coding genome, research into potential cancer drivers in the non-coding regions showed limited success so far. Here, we present a novel comprehensive framework Dr.Nod for detection of non-coding cis-regulatory candidate driver mutations that are associated with dysregulated gene expression using tissue-matched enhancer-gene annotations. Applying the framework to data from over 1500 tumours across eight tissues revealed a 4.4-fold enrichment of candidate driver mutations in regulatory regions of known cancer driver genes. An overarching conclusion that emerges is that the non-coding driver mutations contribute to cancer by significantly altering transcription factor binding sites, leading to upregulation of tissue-matched oncogenes and down-regulation of tumour-suppressor genes. Interestingly, more than half of the detected cancer-promoting non-coding regulatory driver mutations are over 20 kb distant from the cancer-associated genes they regulate. Our results show the importance of tissue-matched enhancer-gene maps, functional impact of mutations, and complex background mutagenesis model for the prediction of non-coding regulatory drivers. In conclusion, our study demonstrates that non-coding mutations in enhancers play a previously underappreciated role in cancer and dysregulation of clinically relevant target genes.

     
    more » « less
  2. Abstract

    The early detection of neurodevelopmental disorders (NDDs) can significantly improve patient outcomes. The differential burden of non-synonymous de novo mutation among NDD cases and controls indicates that de novo coding variation can be used to identify a subset of samples that will likely display an NDD phenotype. Thus, we have developed an approach for the accurate prediction of NDDs with very low false positive rate (FPR) using de novo coding variation for a small subset of cases. We use a shallow neural network that integrates de novo likely gene-disruptive and missense variants, measures of gene constraint, and conservation information to predict a small subset of NDD cases at very low FPR and prioritizes NDD risk genes for future clinical study.

     
    more » « less
  3. Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)—a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome. 
    more » « less
  4. Stamatakis, Alexandros (Ed.)
    Abstract Motivation Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome (‘samples-specific’ strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information Supplementary data are available at Bioinformatics Advances online. 
    more » « less