skip to main content

Title: Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several Desulfovibrio species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, hysB and hydA, and sat and dsrB were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB’s role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sulfate-reducing bacteria (SRB) have a unique ability to respire under anaerobic conditions using sulfate as a terminal electron acceptor, reducing it to hydrogen sulfide. SRB thrives in many natural environments (freshwater sediments and salty marshes), deep subsurface environments (oil wells and hydrothermal vents), and processing facilities in an industrial setting. Owing to their ability to alter the physicochemical properties of underlying metals, SRB can induce fouling, corrosion, and pipeline clogging challenges. Indigenous SRB causes oil souring and associated product loss and, subsequently, the abandonment of impacted oil wells. The sessile cells in biofilms are 1,000 times more resistant to biocides and induce 100-fold greater corrosion than their planktonic counterparts. To effectively combat the challenges posed by SRB, it is essential to understand their molecular mechanisms of biofilm formation and corrosion. Here, we examine the critical genes involved in biofilm formation and microbiologically influenced corrosion and categorize them into various functional categories. The current effort also discusses chemical and biological methods for controlling the SRB biofilms. Finally, we highlight the importance of surface engineering approaches for controlling biofilm formation on underlying metal surfaces. 
    more » « less
  2. Sulfate-reducing bacteria (SRB) are anaerobic bacteria that form biofilm and induce corrosion on various material surfaces. The quorum sensing (QS) system that employs acyl homoserine lactone (AHL)-type QS molecules primarily govern biofilm formation. Studies on SRB have reported the presence of AHL, but no AHL synthase have been annotated in SRB so far. In this computational study, we used a combination of data mining, multiple sequence alignment (MSA), homology modeling and docking to decode a putative AHL synthase in the model SRB, Desulfovibrio vulgaris Hildenborough (DvH). Through data mining, we shortlisted 111 AHL synthase genes. Conserved domain analysis of 111 AHL synthase genes generated a consensus sequence. Subsequent MSA of the consensus sequence with DvH genome indicated that DVU_2486 (previously uncharacterized protein from acetyltransferase family) is the gene encoding for AHL synthase. Homology modeling revealed the existence of seven α-helices and six β sheets in the DvH AHL synthase. The amalgamated study of hydrophobicity, binding energy, and tunnels and cavities revealed that Leu99, Trp104, Arg139, Trp97, and Tyr36 are the crucial amino acids that govern the catalytic center of this putative synthase. Identifying AHL synthase in DvH would provide more comprehensive knowledge on QS mechanism and help design strategies to control biofilm formation. 
    more » « less
  3. Abstract

    Electro‐responsive functional materials can play a critical role in selective metal recovery and recycling due to the need for molecular differentiation between transition metals in complex mixtures. Redox‐active metallopolymers are a promising platform for electrochemical separations, offering versatile structural tuning and fast electron transfer. First, through a judicious selection of polymer structure between a main‐chain metallopolymer (polyferrocenylsilane) and a pendant‐group metallopolymer (polyvinylferrocene), charge‐transfer interactions and binding strength toward competing metal ions are tuned, which as a result, dictate selectivity. For example, almost an order of magnitude increase in separation factor between chromate and meta‐vanadate can be achieved, depending on polymer structure. Second, these metallopolymer electrodes exhibit potential‐dependent selectivity that can even flip ion preference, based solely on electrical means—indicating a control parameter that is orthogonal to structural modifications. Finally, this work presents a framework for evaluating electrochemical separations in multicomponent ion mixtures and elucidates the underlying charge‐transfer mechanisms resulting in molecular selectivity through a combination of spectroscopy and electronic structure calculations. The findings demonstrate the applicability of redox‐metallopolymers in tailored electrochemical separations for environmental remediation, value‐added metal recovery, waste recycling, and even mining processing.

    more » « less
  4. Parkinson’s disease (PD) is a movement disorder caused by a dopamine deficit in the brain. Current therapies primarily focus on dopamine modulators or replacements, such as levodopa. Although dopamine replacement can help alleviate PD symptoms, therapies targeting the underlying neurodegenerative process are limited. The study objective was to use artificial intelligence to rank the most promising repurposed drug candidates for PD. Natural language processing (NLP) techniques were used to extract text relationships from 33+ million biomedical journal articles from PubMed and map relationships between genes, proteins, drugs, diseases, etc., into a knowledge graph. Cross-domain text mining, hub network analysis, and unsupervised learning rank aggregation were performed in SemNet 2.0 to predict the most relevant drug candidates to levodopa and PD using relevance-based HeteSim scores. The top predicted adjuvant PD therapies included ebastine, an antihistamine for perennial allergic rhinitis; levocetirizine, another antihistamine; vancomycin, a powerful antibiotic; captopril, an angiotensin-converting enzyme (ACE) inhibitor; and neramexane, an N-methyl-D-aspartate (NMDA) receptor agonist. Cross-domain text mining predicted that antihistamines exhibit the capacity to synergistically alleviate Parkinsonian symptoms when used with dopamine modulators like levodopa or levodopa–carbidopa. The relationship patterns among the identified adjuvant candidates suggest that the likely therapeutic mechanism(s) of action of antihistamines for combatting the multi-factorial PD pathology include counteracting oxidative stress, amending the balance of neurotransmitters, and decreasing the proliferation of inflammatory mediators. Finally, cross-domain text mining interestingly predicted a strong relationship between PD and liver disease.

    more » « less
  5. Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease worldwide. This study’s goal was to identify the signaling drivers and pathways that modulate glomerular endothelial dysfunction in DKD via artificial intelligence-enabled literature-based discovery. Cross-domain text mining of 33+ million PubMed articles was performed with SemNet 2.0 to identify and rank multi-scalar and multi-factorial pathophysiological concepts related to DKD. A set of identified relevant genes and proteins that regulate different pathological events associated with DKD were analyzed and ranked using normalized mean HeteSim scores. High-ranking genes and proteins intersected three domains—DKD, the immune response, and glomerular endothelial cells. The top 10% of ranked concepts were mapped to the following biological functions: angiogenesis, apoptotic processes, cell adhesion, chemotaxis, growth factor signaling, vascular permeability, the nitric oxide response, oxidative stress, the cytokine response, macrophage signaling, NFκB factor activity, the TLR pathway, glucose metabolism, the inflammatory response, the ERK/MAPK signaling response, the JAK/STAT pathway, the T-cell-mediated response, the WNT/β-catenin pathway, the renin–angiotensin system, and NADPH oxidase activity. High-ranking genes and proteins were used to generate a protein–protein interaction network. The study results prioritized interactions or molecules involved in dysregulated signaling in DKD, which can be further assessed through biochemical network models or experiments.

    more » « less