skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Inverted duplicate DNA sequences increase translocation rates through sequencing nanopores resulting in reduced base calling accuracy
Abstract Inverted duplicated DNA sequences are a common feature of structural variants (SVs) and copy number variants (CNVs). Analysis of CNVs containing inverted duplicated DNA sequences using nanopore sequencing identified recurrent aberrant behavior characterized by low confidence, incorrect and missed base calls. Inverted duplicate DNA sequences in both yeast and human samples were observed to have systematic elevation in the electrical current detected at the nanopore, increased translocation rates and decreased sampling rates. The coincidence of inverted duplicated DNA sequences with dramatically reduced sequencing accuracy and an increased translocation rate suggests that secondary DNA structures may interfere with the dynamics of transit of the DNA through the nanopore.  more » « less
Award ID(s):
1818234
PAR ID:
10164112
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
48
Issue:
9
ISSN:
0305-1048
Page Range / eLocation ID:
4940 to 4945
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Copy number variants (CNVs) are regions of the genome that vary in integer copy number. CNVs, which comprise both amplifications and deletions of DNA sequence, have been identified across all domains of life, from bacteria and archaea to plants and animals. CNVs are an important source of genetic diversity, and can drive rapid adaptive evolution and progression of heritable and somatic human diseases, such as cancer. However, despite their evolutionary importance and clinical relevance, CNVs remain understudied compared to single-nucleotide variants (SNVs). This is a consequence of the inherent difficulties in detecting CNVs at low-to-intermediate frequencies in heterogeneous populations of cells. Here, we discuss molecular methods used to detect CNVs, the limitations associated with using these techniques, and the application of new and emerging technologies that present solutions to these challenges. The goal of this short review and perspective is to highlight aspects of CNV biology that are understudied and define avenues for further research that address specific gaps in our knowledge of these complex alleles. We describe our recently developed method for CNV detection in which a fluorescent gene functions as a single-cell CNV reporter and present key findings from our evolution experiments in Saccharomyces cerevisiae. Using a CNV reporter, we found that CNVs are generated at a high rate and undergo selection with predictable dynamics across independently evolving replicate populations. Many CNVs appear to be generated through DNA replication-based processes that are mediated by the presence of short, interrupted, inverted-repeat sequences. Our results have important implications for the role of CNVs in evolutionary processes and the molecular mechanisms that underlie CNV formation. We discuss the possible extension of our method to other applications, including tracking the dynamics of CNVs in models of human tumors. 
    more » « less
  2. Abstract Duplicated genes provide the opportunity for evolutionary novelty and adaptive divergence. In many cases, having more gene copies increases gene expression, which might facilitate adaptation to stressful or novel environments. Conversely, overexpression or misexpression of duplicated genes can be detrimental and subject to negative selection. In this scenario, newly duplicate genes may evade purifying selection if they are epigenetically silenced, at least temporarily, leading them to persist in populations as copy number variations (CNVs). In animals and plants, younger gene duplicates tend to have higher levels of DNA methylation and lower levels of gene expression, suggesting epigenetic regulation could promote the retention of gene duplications via expression repression or silencing. Here, we test the hypothesis that DNA methylation variation coincides with young duplicate genes that are segregating as CNVs in six populations of the three‐spined stickleback that span a salinity gradient from 4 to 30 PSU. Using reduced‐representation bisulfite sequencing, we found DNA methylation and CNV differentiation outliers rarely overlapped. Whereas lineage‐specific genes and young duplicates were found to be highly methylated, just two gene CNVs showed a significant association between promoter methylation level and copy number, suggesting that DNA methylation might not interact with CNVs in our dataset. If most new duplications are regulated for dosage by epigenetic mechanisms, our results do not support a strong contribution from DNA methylation soon after duplication. Instead, our results are consistent with a preference to duplicate genes that are already highly methylated. 
    more » « less
  3. Chemists have now synthesized new kinds of DNA that add nucleotides to the four standard nucleotides (guanine, adenine, cytosine, and thymine) found in standard Terran DNA. Such “artificially expanded genetic information systems” are today used in molecular diagnostics; to support directed evolution to create medically useful receptors, ligands, and catalysts; and to explore issues related to the early evolution of life. Further applications are limited by the inability to directly sequence DNA containing nonstandard nucleotides. Nanopore sequencing is well-suited for this purpose, as it does not require enzymatic synthesis, amplification, or nucleotide modification. Here, we take the first steps to realize nanopore sequencing of an 8-letter “hachimoji” expanded DNA alphabet by assessing its nanopore signal range using the MspA (Mycobacterium smegmatis porin A) nanopore. We find that hachimoji DNA exhibits a broader signal range in nanopore sequencing than standard DNA alone and that hachimoji single-base substitutions are distinguishable with high confidence. Because nanopore sequencing relies on a molecular motor to control the motion of DNA, we then assessed the compatibility of the Hel308 motor enzyme with nonstandard nucleotides by tracking the translocation of single Hel308 molecules along hachimoji DNA, monitoring the enzyme kinetics and premature enzyme dissociation from the DNA. We find that Hel308 is compatible with hachimoji DNA but dissociates more frequently when walking over C-glycoside nucleosides, compared to N-glycosides. C-glycocide nucleosides passing a particular site within Hel308 induce a higher likelihood of dissociation. This highlights the need to optimize nanopore sequencing motors to handle different glycosidic bonds. It may also inform designs of future alternative DNA systems that can be sequenced with existing motors and pores. 
    more » « less
  4. Modeling and simulation has become an invaluable partner in development of nanopore sensing systems. The key advantage of the nanopore sensing method – the ability to rapidly detect individual biomolecules as a transient reduction of the ionic current flowing through the nanopore – is also its key deficiency, as the current signal itself rarely provides direct information about the chemical structure of the biomolecule. Complementing experimental calibration of the nanopore sensor readout, coarse-grained and all-atom molecular dynamics simulations have been used extensively to characterize the nanopore translocation process and to connect the microscopic events taking place inside the nanopore to the experimentally measured ionic current blockades. Traditional coarse-grained simulations, however, lack the precision needed to predict ionic current blockades with atomic resolution whereas traditional all-atom simulations are limited by the length and time scales amenable to the method. Here, we describe a multi-resolution framework for modeling electric field-driven passage of DNA molecules and nanostructures through to-scale models of synthetic nanopore systems. We illustrate the method by simulating translocation of double-stranded DNA through a solid-state nanopore and a micron-scale slit, capture and translocation of single-stranded DNA in a double nanopore system, and modeling ionic current readout from a DNA origami nanostructure passage through a nanocapillary. We expect our multi-resolution simulation framework to aid development of the nanopore field by providing accurate, to-scale modeling capability to research laboratories that do not have access to leadership supercomputer facilities. 
    more » « less
  5. Nanopore sequencing is an emerging new technology for sequencing DNA, which can read long fragments of DNA (∼50,000 bases) unlike most current sequencers which can only read hundreds of bases. While nanopore sequencers can acquire long reads, the high error rates (≈ 30%) pose a technical challenge. In a nanopore sequencer, a DNA is migrated through a nanopore and current variations are measured. The DNA sequence is inferred from this observed current pattern using an algorithm called a base-caller. In this paper, we propose a mathematical model for the “channel” from the input DNA sequence to the observed current, and calculate bounds on the information extraction capacity of the nanopore sequencer. This model incorporates impairments like inter-symbol interference, deletions, as well as random response. The practical application of such information bounds is two-fold: (1) benchmarking present base-calling algorithms, and (2) offering an optimization objective for designing better nanopore sequencers. 
    more » « less