skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2026

Title: ProTaxoVis—protein taxonomic visualisation of presence
Abstract Background:Protein presence information is an essential component of biological pathway identification. Presence of certain enzymes in an organism points towards the metabolic pathways that occur within it, whereas the absence of these enzymes indicates either the existence of alternative pathways or a lack of these pathways altogether. The same inference applies to regulatory pathways such as gene regulation and signal transduction. Protein presence information therefore forms the basis for biological pathway studies, and patterns in presence-absence across multiple organisms allow for comparative pathway analyses. Results:Here we present ProTaxoVis, a novel bioinformatic tool that extracts protein presence information from database queries and maps it to a taxonomic tree or heatmap. ProTaxoVis generates a large-scale overview of presence patterns in taxonomic clades of interest. This overview reveals protein distribution patterns, and this can be used to deduce pathway evolution or to probe other biological questions. ProTaxoVis combines and filters sequence query results to extract information on the distribution of proteins and translates this information into two types of visual outputs: taxonomic trees and heatmaps. The trees supplement their topology with scaled pie-chart representations per node of the presence of target proteins and combinations of these proteins, such that patterns in taxonomic groups can easily be identified. The heatmap visualisation shows presence and conservation of these proteins for a user-determined set of species, allowing for a more detailed view over a larger group of proteins as compared to the trees. ProTaxoVis also allows for visual quality checks of hits based on a coverage plot and a length histogram, which can be used to determine e-value and minimum protein length cutoffs. Tabular output of resulting data from the query, combined, and heatmap building step are saved and easily accessible for further analyses. Conclusions:We evaluate our tool with the phosphoribosyltransferases, a transferase enzyme family with notable distribution patterns amongst organisms of varying complexities and across Eukaryota, Bacteria, and Archaea. ProTaxoVis is open-source and available at:https://github.com/MolecularBioinformatics/ProTaxoVis.  more » « less
Award ID(s):
2312378
PAR ID:
10609969
Author(s) / Creator(s):
; ;
Publisher / Repository:
GitHub
Date Published:
Journal Name:
BMC Bioinformatics
Volume:
26
Issue:
1
ISSN:
1471-2105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence‐based predicted contact or distance information is used. Contact‐assisted or distance‐assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query–template alignment. We present a new distance‐ and orientation‐based covariational threading method called DisCovER by effectively integrating information from inter‐residue distance and orientation along with the topological network neighborhood of a query–template alignment. Our method first selects a subset of templates using standard profile‐based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance‐ and orientation‐based query–template alignment using an iterative double dynamic programming framework. Multiple large‐scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state‐of‐the‐art threading approaches, and that the integration of the neighborhood effect with the inter‐residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available athttps://github.com/Bhattacharya-Lab/DisCovER. 
    more » « less
  2. Abstract BackgroundProtein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO. ResultsUsing an intermediate fusion-based stacked generalization approach, we integrated embeddings from supervised embedding layer and contextualized protein language model (ProtT5) and developed a tool called pLMSNOSite (protein language model-based SNO site predictor). On an independent test set of experimentally identified SNO sites, pLMSNOSite achieved values of 0.340, 0.735 and 0.773 for MCC, sensitivity and specificity respectively. These results show that pLMSNOSite performs better than the compared approaches for the prediction of S-nitrosylation sites. ConclusionTogether, the experimental results suggest that pLMSNOSite achieves significant improvement in the prediction performance of S-nitrosylation sites and represents a robust computational approach for predicting protein S-nitrosylation sites. pLMSNOSite could be a useful resource for further elucidation of SNO and is publicly available athttps://github.com/KCLabMTU/pLMSNOSite. 
    more » « less
  3. Abstract BackgroundAdding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. ResultsWe present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available athttps://github.com/c5shen/EMMA. ConclusionsEMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment. 
    more » « less
  4. Abstract The Membranome database provides comprehensive structural information on single‐pass (i.e., bitopic) membrane proteins from six evolutionarily distant organisms, including protein–protein interactions, complexes, mutations, experimental structures, and models of transmembrane α‐helical dimers. We present a new version of this database, Membranome 3.0, which was significantly updated by revising the set of 5,758 bitopic proteins and incorporating models generated by AlphaFold 2 in the database. The AlphaFold models were parsed into structural domains located at the different membrane sides, modified to exclude low‐confidence unstructured terminal regions and signal sequences, validated through comparison with available experimental structures, and positioned with respect to membrane boundaries. Membranome 3.0 was re‐developed to facilitate visualization and comparative analysis of multiple 3D structures of proteins that belong to a specified family, complex, biological pathway, or membrane type. New tools for advanced search and analysis of proteins, their interactions, complexes, and mutations were included. The database is freely accessible athttps://membranome.org. 
    more » « less
  5. Abstract Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introducetopiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum‐likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML‐NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non‐expert readers to the steps required for ASR, describe the specific design choices made intopiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download:https://github.com/harmslab/topiary. 
    more » « less