skip to main content


Title: Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria

The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study usedOleidesulfovibrio alaskensisG20 (OA G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA G20. Retrieved gene list was further used to enrich protein–protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, thesatgene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under “persistent,” inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under “shell.” Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies.

 
more » « less
Award ID(s):
1920954
PAR ID:
10537352
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Frontiers in Microbiology
Date Published:
Journal Name:
Frontiers in Microbiology
Volume:
14
ISSN:
1664-302X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Noncoding RNAs (ncRNAs) play key roles in the regulation of important pathways, including cellular growth, stress management, signaling, and biofilm formation. Sulfate-reducing bacteria (SRB) contribute to huge economic losses causing microbial-induced corrosion through biofilms on metal surfaces. To effectively combat the challenges posed by SRB, it is essential to understand their molecular mechanisms of biofilm formation. This study aimed to identify ncRNAs in the genome of a model SRB, Oleidesulfovibrio alaskensis G20 (OA G20). Three in silico approaches revealed genome-wide distribution of 37 ncRNAs excluding tRNAs in the OA G20. These ncRNAs belonged to 18 different Rfam families. This study identified riboswitches, sRNAs, RNP, and SRP. The analysis revealed that these ncRNAs could play key roles in the regulation of several pathways of biosynthesis and transport involved in biofilm formation by OA G20. Three sRNAs, Pseudomonas P10, Hammerhead type II, and sX4, which were found in OA G20, are rare and their roles have not been determined in SRB. These results suggest that applying various computational methods could enrich the results and lead to the discovery of additional novel ncRNAs, which could lead to understanding the “rules of life of OA G20” during biofilm formation.

     
    more » « less
  2. A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several Desulfovibrio species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, hysB and hydA, and sat and dsrB were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB’s role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time. 
    more » « less
  3. Sulfate-reducing bacteria (SRB) have a unique ability to respire under anaerobic conditions using sulfate as a terminal electron acceptor, reducing it to hydrogen sulfide. SRB thrives in many natural environments (freshwater sediments and salty marshes), deep subsurface environments (oil wells and hydrothermal vents), and processing facilities in an industrial setting. Owing to their ability to alter the physicochemical properties of underlying metals, SRB can induce fouling, corrosion, and pipeline clogging challenges. Indigenous SRB causes oil souring and associated product loss and, subsequently, the abandonment of impacted oil wells. The sessile cells in biofilms are 1,000 times more resistant to biocides and induce 100-fold greater corrosion than their planktonic counterparts. To effectively combat the challenges posed by SRB, it is essential to understand their molecular mechanisms of biofilm formation and corrosion. Here, we examine the critical genes involved in biofilm formation and microbiologically influenced corrosion and categorize them into various functional categories. The current effort also discusses chemical and biological methods for controlling the SRB biofilms. Finally, we highlight the importance of surface engineering approaches for controlling biofilm formation on underlying metal surfaces. 
    more » « less
  4. Cultivated peanut ( Arachis hypogaea ) is one of the most widely grown food legumes in the world, being valued for its high protein and unsaturated oil contents. Drought stress is one of the major constraints that limit peanut production. This study’s objective was to identify the drought-responsive genes preferentially expressed under drought stress in different peanut genotypes. To accomplish this, four genotypes (drought tolerant: C76-16 and 587; drought susceptible: Tifrunner and 506) subjected to drought stress in a rainout shelter experiment were examined. Transcriptome sequencing analysis identified that all four genotypes shared a total of 2,457 differentially expressed genes (DEGs). A total of 139 enriched gene ontology terms consisting of 86 biological processes and 53 molecular functions, with defense response, reproductive process, and signaling pathways, were significantly enriched in the common DEGs. In addition, 3,576 DEGs were identified only in drought-tolerant lines in which a total of 74 gene ontology terms were identified, including 55 biological processes and 19 molecular functions, mainly related to protein modification process, pollination, and metabolic process. These terms were also found in shared genes in four genotypes, indicating that tolerant lines adjusted more related genes to respond to drought. Forty-three significantly enriched Kyoto Encyclopedia of Genes and Genomes pathways were also identified, and the most enriched pathways were those processes involved in metabolic pathways, biosynthesis of secondary metabolites, plant circadian rhythm, phenylpropanoid biosynthesis, and starch and sucrose metabolism. This research expands our current understanding of the mechanisms that facilitate peanut drought tolerance and shed light on breeding advanced peanut lines to combat drought stress. 
    more » « less
  5. Copper (Cu) is an essential micronutrient required as a co-factor in the catalytic center of many enzymes. However, excess Cu can generate pleiotropic effects in the microbial cell. In addition, leaching of Cu from pipelines results in elevated Cu concentration in the environment, which is of public health concern. Sulfate-reducing bacteria (SRB) have been demonstrated to grow in toxic levels of Cu. However, reports on Cu toxicity towards SRB have primarily focused on the degree of toxicity and subsequent elimination. Here, Cu(II) stress-related effects on a model SRB, Desulfovibrio alaskensis G20, is reported. Cu(II) stress effects were assessed as alterations in the transcriptome through RNA-Seq at varying Cu(II) concentrations (5 µM and 15 µM). In the pairwise comparison of control vs. 5 µM Cu(II), 61.43% of genes were downregulated, and 38.57% were upregulated. In control vs. 15 µM Cu(II), 49.51% of genes were downregulated, and 50.5% were upregulated. The results indicated that the expression of inorganic ion transporters and translation machinery was massively modulated. Moreover, changes in the expression of critical biological processes such as DNA transcription and signal transduction were observed at high Cu(II) concentrations. These results will help us better understand the Cu(II) stress-response mechanism and provide avenues for future research. 
    more » « less