Chemists have now synthesized new kinds of DNA that add nucleotides to the four standard nucleotides (guanine, adenine, cytosine, and thymine) found in standard Terran DNA. Such “artificially expanded genetic information systems” are today used in molecular diagnostics; to support directed evolution to create medically useful receptors, ligands, and catalysts; and to explore issues related to the early evolution of life. Further applications are limited by the inability to directly sequence DNA containing nonstandard nucleotides. Nanopore sequencing is well-suited for this purpose, as it does not require enzymatic synthesis, amplification, or nucleotide modification. Here, we take the first steps to realize nanopore sequencing of an 8-letter “hachimoji” expanded DNA alphabet by assessing its nanopore signal range using the MspA (Mycobacterium smegmatis porin A) nanopore. We find that hachimoji DNA exhibits a broader signal range in nanopore sequencing than standard DNA alone and that hachimoji single-base substitutions are distinguishable with high confidence. Because nanopore sequencing relies on a molecular motor to control the motion of DNA, we then assessed the compatibility of the Hel308 motor enzyme with nonstandard nucleotides by tracking the translocation of single Hel308 molecules along hachimoji DNA, monitoring the enzyme kinetics and premature enzyme dissociation from the DNA. We find that Hel308 is compatible with hachimoji DNA but dissociates more frequently when walking over C-glycoside nucleosides, compared to N-glycosides. C-glycocide nucleosides passing a particular site within Hel308 induce a higher likelihood of dissociation. This highlights the need to optimize nanopore sequencing motors to handle different glycosidic bonds. It may also inform designs of future alternative DNA systems that can be sequenced with existing motors and pores.
more »
« less
Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundamental cellular structure and facilitate intracellular and extracellular transport. Essential genes preserve crucial genomics information that may hold the key to a detailed knowledge of life and evolution. Essential gene studies have long been regarded as a vital topic in computational biology due to their relevance. An essential gene is composed of adenine, guanine, cytosine, and thymine and its various combinations. Methods: This paper presents a novel method of extracting information on the stationary patterns of nucleotides such as adenine, guanine, cytosine, and thymine in each gene. For this purpose, some co-occurrence matrices are derived that provide the statistical distribution of stationary patterns of nucleotides in the genes, which is helpful in establishing the relationship between the nucleotides. For extracting discriminant features from each co-occurrence matrix, energy, entropy, homogeneity, contrast, and dissimilarity features are computed, which are extracted from all co-occurrence matrices and then concatenated to form a feature vector representing each essential gene. Finally, supervised machine learning algorithms are applied for essential gene classification based on the extracted fixed-dimensional feature vectors. Results: For comparison, some existing state-of-the-art feature representation techniques such as Shannon entropy (SE), Hurst exponent (HE), fractal dimension (FD), and their combinations have been utilized. Discussion: An extensive experiment has been performed for classifying the essential genes of five species that show the robustness and effectiveness of the proposed methodology.
more »
« less
- Award ID(s):
- 1761839
- PAR ID:
- 10464623
- Date Published:
- Journal Name:
- Frontiers in Genetics
- Volume:
- 14
- ISSN:
- 1664-8021
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT: We report the generation and spectroscopic study of hydrogen-rich DNA tetranucleotide cation radicals (GATC+2H)+• and (AGTC+2H)+•. The radicals were generated in the gas phase by one-electron reduction of the respective dications (GATC +2H)2+ and (AGTC+2H)2+ and characterized by collision-induced dissociation and photodissociation tandem mass spectrometry and UV−vis photodissociation action spectroscopy. Among several absorption bands observed for (GATC+2H)+•, the bands at 340 and 450 nm were assigned to radical chromophores. Timedependent density functional theory calculations including vibronic transitions in the visible region of the spectrum were used to provide theoretical absorption spectra of several low-energy tetranucleotide tautomers having cytosine-, adenine-, and thymine- based radical chromophores that did not match the experimental spectrum. Instead, the calculations indicated the formation of a new isomer with the 7,8-H-dihydroguanine cation radical moiety. The isomerization involved hydrogen migration from the cytosine N-3−H radical to the C-8 position in N-7-protonated guanine that was calculated to be 87 kJ mol−1 exothermic and had a low-energy transition state. Although the hydrogen migration was facilitated by the spatial proximity of the guanine and cytosine bases in the low-energy (GATC+2H)+• intermediate formed by electron transfer, the reaction was calculated to have a large negative activation entropy. Rice−Ramsperger−Kassel−Marcus (RRKM) and transition state theory kinetic analysis indicated that the isomerization occurred rapidly in hot cation radicals produced by electron transfer with the population-weighed rate constant of k = 8.9 × 103 s−1. The isomerization was calculated to be too slow to proceed on the experimental time scale in thermal cation radicals at 310 K.more » « less
-
A metamaterial composed of diamond-shaped (70 m X 35 m) copper patches was designed and used to detect nanoparticles with 0.75-1.1 terahertz transmission spectroscopy. Deoxyribonucleic acid (DNA) bases adenine, thymine, cytosine, and guanine were detected and identified. Cytosine showed 1.7 dB higher absorption around 0.975 THz than the other bases. SARS-CoV-2 infected saliva showed different spectrum and -10 dB higher absorption than uninfected saliva over 0.75-1.1 THz. Other nanoparticles consisting of 100-500 nm antimony, carbon black, zeolite aluminosilicate molecular sieves), Terfenol-D (Tb0.3 Dy0.7Fe2), Cu2S, Ag2S, dust collected from bench tops, 10-100 m size diamond particles, red polystyrene beads, iron particles and graphene sheets were also tested. Sensor sensitivity for uninfected saliva was 0.3 dB/ng and for infected saliva was 0.8 dB/ng. The metamaterial surface studied here enables detection of airborne particles larger than 10 m in diameter.more » « less
-
Nucleotide base composition plays an influential role in the molecular mechanisms involved in gene function, phenotype, and amino acid composition. GC content (proportion of guanine and cytosine in DNA sequences) shows a high level of variation within and among species. Many studies measure GC content in a small number of genes, which may not be representative of genome-wide GC variation. One challenge when assembling extensive genomic data sets for these studies is the significant amount of resources (monetary and computational) associated with data processing, and many bioinformatic tools have not been optimized for resource efficiency. Using a high-performance computing (HPC) cluster, we manipulated resources provided to the targeted gene assembly program, automated target restricted assembly method (aTRAM), to determine an optimum way to run the program to maximize resource use. Using our optimum assembly approach, we assembled and measured GC content of all of the protein-coding genes of a diverse group of parasitic feather lice. Of the 499 426 genes assembled across 57 species, feather lice were GC-poor (mean GC = 42.96%) with a significant amount of variation within and between species (GC range = 19.57%-73.33%). We found a significant correlation between GC content and standard deviation per taxon for overall GC and GC3, which could indicate selection for G and C nucleotides in some species. Phylogenetic signal of GC content was detected in both GC and GC3. This research provides a large-scale investigation of GC content in parasitic lice laying the foundation for understanding the basis of variation in base composition across species.more » « less
-
The hydrolytic deamination of cytosine and 5-methylcytosine drives many of the transition mutations observed in human cancer. The deamination-induced mutagenic intermediates include either uracil or thymine adducts mispaired with guanine. While a substantial array of methods exist to measure other types of DNA adducts, the cytosine deamination adducts pose unusual analytical problems, and adequate methods to measure them have not yet been developed. We describe here a novel hybrid thymine DNA glycosylase (TDG) that is comprised of a 29-amino acid sequence from human TDG linked to the catalytic domain of a thymine glycosylase found in an archaeal thermophilic bacterium. Using defined-sequence oligonucleotides, we show that hybrid TDG has robust mispair-selective activity against deaminated U:G and T:G mispairs. We have further developed a method for separating glycosylase-released free bases from oli- gonucleotides and DNA followed by GC–MS/MS quantification. Using this approach, we have measured for the first time the levels of total uracil, U:G, and T:G pairs in calf thymus DNA. The method presented here will allow the measurement of the for- mation, persistence, and repair of a biologically important class of deaminated cytosine adducts.more » « less
An official website of the United States government

