skip to main content


Title: Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundamental cellular structure and facilitate intracellular and extracellular transport. Essential genes preserve crucial genomics information that may hold the key to a detailed knowledge of life and evolution. Essential gene studies have long been regarded as a vital topic in computational biology due to their relevance. An essential gene is composed of adenine, guanine, cytosine, and thymine and its various combinations. Methods: This paper presents a novel method of extracting information on the stationary patterns of nucleotides such as adenine, guanine, cytosine, and thymine in each gene. For this purpose, some co-occurrence matrices are derived that provide the statistical distribution of stationary patterns of nucleotides in the genes, which is helpful in establishing the relationship between the nucleotides. For extracting discriminant features from each co-occurrence matrix, energy, entropy, homogeneity, contrast, and dissimilarity features are computed, which are extracted from all co-occurrence matrices and then concatenated to form a feature vector representing each essential gene. Finally, supervised machine learning algorithms are applied for essential gene classification based on the extracted fixed-dimensional feature vectors. Results: For comparison, some existing state-of-the-art feature representation techniques such as Shannon entropy (SE), Hurst exponent (HE), fractal dimension (FD), and their combinations have been utilized. Discussion: An extensive experiment has been performed for classifying the essential genes of five species that show the robustness and effectiveness of the proposed methodology.  more » « less
Award ID(s):
1761839
NSF-PAR ID:
10464623
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Genetics
Volume:
14
ISSN:
1664-8021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chemists have now synthesized new kinds of DNA that add nucleotides to the four standard nucleotides (guanine, adenine, cytosine, and thymine) found in standard Terran DNA. Such “artificially expanded genetic information systems” are today used in molecular diagnostics; to support directed evolution to create medically useful receptors, ligands, and catalysts; and to explore issues related to the early evolution of life. Further applications are limited by the inability to directly sequence DNA containing nonstandard nucleotides. Nanopore sequencing is well-suited for this purpose, as it does not require enzymatic synthesis, amplification, or nucleotide modification. Here, we take the first steps to realize nanopore sequencing of an 8-letter “hachimoji” expanded DNA alphabet by assessing its nanopore signal range using the MspA (Mycobacterium smegmatis porin A) nanopore. We find that hachimoji DNA exhibits a broader signal range in nanopore sequencing than standard DNA alone and that hachimoji single-base substitutions are distinguishable with high confidence. Because nanopore sequencing relies on a molecular motor to control the motion of DNA, we then assessed the compatibility of the Hel308 motor enzyme with nonstandard nucleotides by tracking the translocation of single Hel308 molecules along hachimoji DNA, monitoring the enzyme kinetics and premature enzyme dissociation from the DNA. We find that Hel308 is compatible with hachimoji DNA but dissociates more frequently when walking over C-glycoside nucleosides, compared to N-glycosides. C-glycocide nucleosides passing a particular site within Hel308 induce a higher likelihood of dissociation. This highlights the need to optimize nanopore sequencing motors to handle different glycosidic bonds. It may also inform designs of future alternative DNA systems that can be sequenced with existing motors and pores. 
    more » « less
  2. ABSTRACT: We report the generation and spectroscopic study of hydrogen-rich DNA tetranucleotide cation radicals (GATC+2H)+• and (AGTC+2H)+•. The radicals were generated in the gas phase by one-electron reduction of the respective dications (GATC +2H)2+ and (AGTC+2H)2+ and characterized by collision-induced dissociation and photodissociation tandem mass spectrometry and UV−vis photodissociation action spectroscopy. Among several absorption bands observed for (GATC+2H)+•, the bands at 340 and 450 nm were assigned to radical chromophores. Timedependent density functional theory calculations including vibronic transitions in the visible region of the spectrum were used to provide theoretical absorption spectra of several low-energy tetranucleotide tautomers having cytosine-, adenine-, and thymine- based radical chromophores that did not match the experimental spectrum. Instead, the calculations indicated the formation of a new isomer with the 7,8-H-dihydroguanine cation radical moiety. The isomerization involved hydrogen migration from the cytosine N-3−H radical to the C-8 position in N-7-protonated guanine that was calculated to be 87 kJ mol−1 exothermic and had a low-energy transition state. Although the hydrogen migration was facilitated by the spatial proximity of the guanine and cytosine bases in the low-energy (GATC+2H)+• intermediate formed by electron transfer, the reaction was calculated to have a large negative activation entropy. Rice−Ramsperger−Kassel−Marcus (RRKM) and transition state theory kinetic analysis indicated that the isomerization occurred rapidly in hot cation radicals produced by electron transfer with the population-weighed rate constant of k = 8.9 × 103 s−1. The isomerization was calculated to be too slow to proceed on the experimental time scale in thermal cation radicals at 310 K. 
    more » « less
  3. Abstract Background The red sea urchin Mesocentrotus franciscanus is an ecologically important kelp forest herbivore and an economically valuable wild fishery species. To examine how M. franciscanus responds to its environment on a molecular level, differences in gene expression patterns were observed in embryos raised under combinations of two temperatures (13 °C or 17 °C) and two p CO 2 levels (475 μatm or 1050 μatm). These combinations mimic various present-day conditions measured during and between upwelling events in the highly dynamic California Current System with the exception of the 17 °C and 1050 μatm combination, which does not currently occur. However, as ocean warming and acidification continues, warmer temperatures and higher p CO 2 conditions are expected to increase in frequency and to occur simultaneously. The transcriptomic responses of the embryos were assessed at two developmental stages (gastrula and prism) in light of previously described plasticity in body size and thermotolerance under these temperature and p CO 2 treatments. Results Although transcriptomic patterns primarily varied by developmental stage, there were pronounced differences in gene expression as a result of the treatment conditions. Temperature and p CO 2 treatments led to the differential expression of genes related to the cellular stress response, transmembrane transport, metabolic processes, and the regulation of gene expression. At each developmental stage, temperature contributed significantly to the observed variance in gene expression, which was also correlated to the phenotypic attributes of the embryos. On the other hand, the transcriptomic response to p CO 2 was relatively muted, particularly at the prism stage. Conclusions M. franciscanus exhibited transcriptomic plasticity under different temperatures, indicating their capacity for a molecular-level response that may facilitate red sea urchins facing ocean warming as climate change continues. In contrast, the lack of a robust transcriptomic response, in combination with observations of decreased body size, under elevated p CO 2 levels suggest that this species may be negatively affected by ocean acidification. High present-day p CO 2 conditions that occur due to coastal upwelling may already be influencing populations of M. franciscanus . 
    more » « less
  4. A metamaterial composed of diamond-shaped (70 m X 35 m) copper patches was designed and used to detect nanoparticles with 0.75-1.1 terahertz transmission spectroscopy. Deoxyribonucleic acid (DNA) bases adenine, thymine, cytosine, and guanine were detected and identified. Cytosine showed 1.7 dB higher absorption around 0.975 THz than the other bases. SARS-CoV-2 infected saliva showed different spectrum and -10 dB higher absorption than uninfected saliva over 0.75-1.1 THz. Other nanoparticles consisting of 100-500 nm antimony, carbon black, zeolite aluminosilicate molecular sieves), Terfenol-D (Tb0.3 Dy0.7Fe2), Cu2S, Ag2S, dust collected from bench tops, 10-100 m size diamond particles, red polystyrene beads, iron particles and graphene sheets were also tested. Sensor sensitivity for uninfected saliva was 0.3 dB/ng and for infected saliva was 0.8 dB/ng. The metamaterial surface studied here enables detection of airborne particles larger than 10 m in diameter. 
    more » « less
  5. In this work hydrogen bonding in a diverse set of 36 unnatural and the three natural Watson Crick base pairs adenine (A)–thymine (T), adenine (A)–uracil (U) and guanine (G)–cytosine (C) was assessed utilizing local vibrational force constants derived from the local mode analysis, originally introduced by Konkoli and Cremer as a unique bond strength measure based on vibrational spectroscopy. The local mode analysis was complemented by the topological analysis of the electronic density and the natural bond orbital analysis. The most interesting findings of our study are that (i) hydrogen bonding in Watson Crick base pairs is not exceptionally strong and (ii) the N–H⋯N is the most favorable hydrogen bond in both unnatural and natural base pairs while O–H⋯N/O bonds are the less favorable in unnatural base pairs and not found at all in natural base pairs. In addition, the important role of non-classical C–H⋯N/O bonds for the stabilization of base pairs was revealed, especially the role of C–H⋯O bonds in Watson Crick base pairs. Hydrogen bonding in Watson Crick base pairs modeled in the DNA via a QM/MM approach showed that the DNA environment increases the strength of the central N–H⋯N bond and the C–H⋯O bonds, and at the same time decreases the strength of the N–H⋯O bond. However, the general trends observed in the gas phase calculations remain unchanged. The new methodology presented and tested in this work provides the bioengineering community with an efficient design tool to assess and predict the type and strength of hydrogen bonding in artificial base pairs. 
    more » « less