skip to main content


Title: TIGER: inferring DNA replication timing from whole-genome sequence data
Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online  more » « less
Award ID(s):
1921341
NSF-PAR ID:
10230340
Author(s) / Creator(s):
; ;
Editor(s):
Robinson, Peter
Date Published:
Journal Name:
Bioinformatics
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. DNA is replicated according to a defined spatiotemporal program that is linked to both gene regulation and genome stability. The evolutionary forces that have shaped replication timing programs in eukaryotic species are largely unknown. Here, we studied the molecular causes and consequences of replication timing evolution across 94 humans, 95 chimpanzees, and 23 rhesus macaques. Replication timing differences recapitulated the species’ phylogenetic tree, suggesting continuous evolution of the DNA replication timing program in primates. Hundreds of genomic regions had significant replication timing variation between humans and chimpanzees, of which 66 showed advances in replication origin firing in humans, while 57 were delayed. Genes overlapping these regions displayed correlated changes in expression levels and chromatin structure. Many human–chimpanzee variants also exhibited interindividual replication timing variation, pointing to ongoing evolution of replication timing at these loci. Association of replication timing variation with genetic variation revealed that DNA sequence evolution can explain replication timing variation between species. Taken together, DNA replication timing shows substantial and ongoing evolution in the human lineage that is driven by sequence alterations and could impact regulatory evolution at specific genomic sites. 
    more » « less
  2. Abstract

    The spatiotemporal organization of DNA replication produces a highly robust and reproducible replication timing profile. Sequencing-based methods for assaying replication timing genome-wide have become commonplace, but regions of high repeat content in the human genome have remained refractory to analysis. Here, we report the first nearly-gapless telomere-to-telomere replication timing profiles in human, using the T2T-CHM13 genome assembly and sequencing data for five cell lines. We find that replication timing can be successfully assayed in centromeres and large blocks of heterochromatin. Centromeric regions replicate in mid-to-late S-phase and contain replication-timing peaks at a similar density to other genomic regions, while distinct families of heterochromatic satellite DNA differ in their bias for replicating in late S-phase. The high degree of consistency in centromeric replication timing across chromosomes within each cell line prompts further investigation into the mechanisms dictating that some cell lines replicate their centromeres earlier than others, and what the consequences of this variation are.

     
    more » « less
  3. Robinson, Peter (Ed.)
    Abstract Motivation Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. Results We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. Availability and implementation ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. We describe a novel computational approach, CERENKOV (Computational Elucidation of the REgulatory NonKOd- ing Variome), for discriminating regulatory single nucleotide polymorphisms (rSNPs) from non-regulatory SNPs within noncoding genetic loci. CERENKOV is specifically designed for recognizing rSNPs in the context of a post-analysis of a genome-wide association study (GWAS); it includes a novel accuracy scoring metric (which we call average rank, or AV- GRANK) and a novel cross-validation strategy (locus-based sampling) that both correctly account for the “sparse positive bag” nature of the GWAS post-analysis rSNP recognition problem. We trained and validated CERENKOV using a reference set of 15,331 SNPs (the OSU17 SNP set) whose composition is based on selection criteria (linkage disequi- librium and minor allele frequency) that we designed to ensure relevance to GWAS post-analysis. CERENKOV is based on a machine-learning algorithm (gradient boosted decision trees) incorporating 246 SNP annotation features that we extracted from genomic, epigenomic, phylogenetic, and chromatin datasets. CERENKOV includes novel features based on replication timing and DNA shape. We found that tuning a classifier for AUPVR performance does not guaran- tee optimality for AVGRANK. We compared the validation performance of CERENKOV to nine other methods for rSNP recognition (including GWAVA, RSVP, DeltaSVM, DeepSEA, Eigen, and DANQ), and found that CERENKOV’s validation performance is the strongest out of all of the classifiers that we tested, by both traditional global rank-based measures (⟨AUPVR⟩ = 0.506; ⟨AUROC⟩ = 0.855) and AVGRANK (⟨AVGRANK⟩ = 3.877). The source code for CERENKOV is available on GitHub and the SNP feature data files are available for download via the CERENKOV website. 
    more » « less
  5. RNA secondary structures play diverse roles in positive-sense (+) RNA virus infections, but those located with the replication protein coding sequence can be difficult to investigate. Structures that regulate the translation of replication proteins pose particular challenges, as their potential involvement in post-translational steps cannot be easily discerned independent of their roles in regulating translation. In the current study, we attempted to overcome these difficulties by providing viral replication proteins in trans. Specifically, we modified the plant-infecting turnip crinkle virus (TCV) into variants that are unable to translate one (p88) or both (p28 and p88) replication proteins, and complemented their replication with the corresponding replication protein(s) produced from separate, non-replicating constructs. This approach permitted us to re-examine the p28/p88 coding region for potential RNA elements needed for TCV replication. We found that, while more than a third of the p88 coding sequence could be deleted without substantially affecting viral RNA levels, two relatively small regions, known as RSE and IRE, were essential for robust accumulation of TCV genomic RNA, but not subgenomic RNAs. In particular, the RSE element, found previously to be required for regulating the translational read-through of p28 stop codon to produce p88, contained sub-elements needed for efficient replication of the TCV genome. Application of this new approach in other viruses could reveal novel RNA secondary structures vital for viral multiplication. 
    more » « less