skip to main content


Title: Inverted duplicate DNA sequences increase translocation rates through sequencing nanopores resulting in reduced base calling accuracy
Abstract Inverted duplicated DNA sequences are a common feature of structural variants (SVs) and copy number variants (CNVs). Analysis of CNVs containing inverted duplicated DNA sequences using nanopore sequencing identified recurrent aberrant behavior characterized by low confidence, incorrect and missed base calls. Inverted duplicate DNA sequences in both yeast and human samples were observed to have systematic elevation in the electrical current detected at the nanopore, increased translocation rates and decreased sampling rates. The coincidence of inverted duplicated DNA sequences with dramatically reduced sequencing accuracy and an increased translocation rate suggests that secondary DNA structures may interfere with the dynamics of transit of the DNA through the nanopore.  more » « less
Award ID(s):
1818234
PAR ID:
10164112
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
48
Issue:
9
ISSN:
0305-1048
Page Range / eLocation ID:
4940 to 4945
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Copy number variants (CNVs) are regions of the genome that vary in integer copy number. CNVs, which comprise both amplifications and deletions of DNA sequence, have been identified across all domains of life, from bacteria and archaea to plants and animals. CNVs are an important source of genetic diversity, and can drive rapid adaptive evolution and progression of heritable and somatic human diseases, such as cancer. However, despite their evolutionary importance and clinical relevance, CNVs remain understudied compared to single-nucleotide variants (SNVs). This is a consequence of the inherent difficulties in detecting CNVs at low-to-intermediate frequencies in heterogeneous populations of cells. Here, we discuss molecular methods used to detect CNVs, the limitations associated with using these techniques, and the application of new and emerging technologies that present solutions to these challenges. The goal of this short review and perspective is to highlight aspects of CNV biology that are understudied and define avenues for further research that address specific gaps in our knowledge of these complex alleles. We describe our recently developed method for CNV detection in which a fluorescent gene functions as a single-cell CNV reporter and present key findings from our evolution experiments in Saccharomyces cerevisiae. Using a CNV reporter, we found that CNVs are generated at a high rate and undergo selection with predictable dynamics across independently evolving replicate populations. Many CNVs appear to be generated through DNA replication-based processes that are mediated by the presence of short, interrupted, inverted-repeat sequences. Our results have important implications for the role of CNVs in evolutionary processes and the molecular mechanisms that underlie CNV formation. We discuss the possible extension of our method to other applications, including tracking the dynamics of CNVs in models of human tumors. 
    more » « less
  2. Summary

    Copy number variants (CNVs) are alternations of DNA of a genome that result in the cell having less or more than two copies of segments of the DNA. CNVs correspond to relatively large regions of the genome, ranging from about one kilobase to several megabases, that are deleted or duplicated. Motivated by CNV analysis based on next generation sequencing data, we consider the problem of detecting and identifying sparse short segments hidden in a long linear sequence of data with an unspecified noise distribution. We propose a computationally efficient method that provides a robust and near optimal solution for segment identification over a wide range of noise distributions. We theoretically quantify the conditions for detecting the segment signals and show that the method near optimally estimates the signal segments whenever it is possible to detect their existence. Simulation studies are carried out to demonstrate the efficiency of the method under various noise distributions. We present results from a CNV analysis of a HapMap Yoruban sample to illustrate the theory and the methods further.

     
    more » « less
  3. Chemists have now synthesized new kinds of DNA that add nucleotides to the four standard nucleotides (guanine, adenine, cytosine, and thymine) found in standard Terran DNA. Such “artificially expanded genetic information systems” are today used in molecular diagnostics; to support directed evolution to create medically useful receptors, ligands, and catalysts; and to explore issues related to the early evolution of life. Further applications are limited by the inability to directly sequence DNA containing nonstandard nucleotides. Nanopore sequencing is well-suited for this purpose, as it does not require enzymatic synthesis, amplification, or nucleotide modification. Here, we take the first steps to realize nanopore sequencing of an 8-letter “hachimoji” expanded DNA alphabet by assessing its nanopore signal range using the MspA (Mycobacterium smegmatis porin A) nanopore. We find that hachimoji DNA exhibits a broader signal range in nanopore sequencing than standard DNA alone and that hachimoji single-base substitutions are distinguishable with high confidence. Because nanopore sequencing relies on a molecular motor to control the motion of DNA, we then assessed the compatibility of the Hel308 motor enzyme with nonstandard nucleotides by tracking the translocation of single Hel308 molecules along hachimoji DNA, monitoring the enzyme kinetics and premature enzyme dissociation from the DNA. We find that Hel308 is compatible with hachimoji DNA but dissociates more frequently when walking over C-glycoside nucleosides, compared to N-glycosides. C-glycocide nucleosides passing a particular site within Hel308 induce a higher likelihood of dissociation. This highlights the need to optimize nanopore sequencing motors to handle different glycosidic bonds. It may also inform designs of future alternative DNA systems that can be sequenced with existing motors and pores. 
    more » « less
  4. An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. This study reports the results of all‐atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. The study is focused on the biologically significant phenylalanine‐glycine repeat peptides (FG‐nups)—parts of the nuclear pore transport machinery. Surprisingly, FG‐nups are found to behave similarly to single stranded DNA: The peptides adhere to graphene and exhibit stepwise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide's charge density or increasing the peptide's hydrophobicity is found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias is observed even when the ratio of charged to hydrophobic amino acids is as low as 1:8. The nanopore transport of the peptides is found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible.

     
    more » « less
  5. Modeling and simulation has become an invaluable partner in development of nanopore sensing systems. The key advantage of the nanopore sensing method – the ability to rapidly detect individual biomolecules as a transient reduction of the ionic current flowing through the nanopore – is also its key deficiency, as the current signal itself rarely provides direct information about the chemical structure of the biomolecule. Complementing experimental calibration of the nanopore sensor readout, coarse-grained and all-atom molecular dynamics simulations have been used extensively to characterize the nanopore translocation process and to connect the microscopic events taking place inside the nanopore to the experimentally measured ionic current blockades. Traditional coarse-grained simulations, however, lack the precision needed to predict ionic current blockades with atomic resolution whereas traditional all-atom simulations are limited by the length and time scales amenable to the method. Here, we describe a multi-resolution framework for modeling electric field-driven passage of DNA molecules and nanostructures through to-scale models of synthetic nanopore systems. We illustrate the method by simulating translocation of double-stranded DNA through a solid-state nanopore and a micron-scale slit, capture and translocation of single-stranded DNA in a double nanopore system, and modeling ionic current readout from a DNA origami nanostructure passage through a nanocapillary. We expect our multi-resolution simulation framework to aid development of the nanopore field by providing accurate, to-scale modeling capability to research laboratories that do not have access to leadership supercomputer facilities. 
    more » « less