skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies
Abstract Comparative functional genomics offers a powerful approach to study species evolution. To date, the majority of these studies have focused on the transcriptome in mammalian and yeast phylogenies. Here, we present a novel multi-species proteomic dataset and a computational pipeline to systematically compare the protein levels across multiple plant species. Globally we find that protein levels diverge according to phylogenetic distance but is more constrained than the mRNA level. Module-level comparative analysis of groups of proteins shows that proteins that are more highly expressed tend to be more conserved. To interpret the evolutionary patterns of conservation and divergence, we develop a novel network-based integrative analysis pipeline that combines publicly available transcriptomic datasets to define co-expression modules. Our analysis pipeline can be used to relate the changes in protein levels to different species-specific phenotypic traits. We present a case study with the rhizobia-legume symbiosis process that supports the role of autophagy in this symbiotic association.  more » « less
Award ID(s):
2010789 1546742
PAR ID:
10228277
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
49
Issue:
1
ISSN:
0305-1048
Page Range / eLocation ID:
e3 to e3
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Goldman, Gustavo H. (Ed.)
    ABSTRACT Gene expression divergence through evolutionary processes is thought to be important for achieving programmed development in multicellular organisms. To test this premise in filamentous fungi, we investigated transcriptional profiles of 3,942 single-copy orthologous genes (SCOGs) in five related sordariomycete species that have morphologically diverged in the formation of their flask-shaped perithecia. We compared expression of the SCOGs to inferred gene expression levels of the most recent common ancestor of the five species, ranking genes from their largest increases to smallest increases in expression during perithecial development in each of the five species. We found that a large proportion of the genes that exhibited evolved increases in gene expression were important for normal perithecial development in Fusarium graminearum . Many of these genes were previously uncharacterized, encoding hypothetical proteins without any known functional protein domains. Interestingly, the developmental stages during which aberrant knockout phenotypes appeared largely coincided with the elevated expression of the deleted genes. In addition, we identified novel genes that affected normal perithecial development in Magnaporthe oryzae and Neurospora crassa , which were functionally and transcriptionally diverged from the orthologous counterparts in F. graminearum . Furthermore, comparative analysis of developmental transcriptomes and phylostratigraphic analysis suggested that genes encoding hypothetical proteins are generally young and transcriptionally divergent between related species. This study provides tangible evidence of shifts in gene expression that led to acquisition of novel function of orthologous genes in each lineage and demonstrates that several genes with hypothetical function are crucial for shaping multicellular fruiting bodies. IMPORTANCE The fungal class Sordariomycetes includes numerous important plant and animal pathogens. It also provides model systems for studying fungal fruiting body development, as its members develop fruiting bodies with a few well-characterized tissue types on common growth media and have rich genomic resources that enable comparative and functional analyses. To understand transcriptional divergence of key developmental genes between five related sordariomycete fungi, we performed targeted knockouts of genes inferred to have evolved significant upward shifts in expression. We found that many previously uncharacterized genes play indispensable roles at different stages of fruiting body development, which have undergone transcriptional activation in specific lineages. These novel genes are predicted to be phylogenetically young and tend to be involved in lineage- or species-specific function. Transcriptional activation of genes with unknown function seems to be more frequent than ever thought, which may be crucial for rapid adaption to changing environments for successful sexual reproduction. 
    more » « less
  2. null (Ed.)
    Abstract Background A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time. Methods Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the high-abundance proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification). Results We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data. Conclusions The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data. 
    more » « less
  3. Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints. 
    more » « less
  4. In the aftermath of COVID-19, screening for pathogens has never been a more relevant problem. However, computational screening for pathogens is challenging due to a variety of factors, including (i) the complexity and role of the host, (ii) virulence factor divergence and dynamics, and (iii) population and community-level dynamics. Considering a potential pathogen's molecular interactions, specifically individual proteins and protein interactions can help pinpoint a potential protein of a given microbe to cause disease. However, existing tools for pathogen screening rely on existing annotations (KEGG, GO, etc), making the assessment of novel and unannotated proteins more challenging. Here, we present an LLM-inspired approach that considers protein sequence and structure to predict protein virulence. We present a two-stage model incorporating evolutionary features captured from the DistilProtBert language model and protein structure in a graph convolutional network. Our model performs better than sequence alone for virulence function when high-quality structures are present, thus representing a path forward for virulence prediction of novel and unannotated proteins. 
    more » « less
  5. Abstract Environmental stress from ultraviolet radiation, elevated temperatures or metal toxicity can lead to reactive oxygen species in cells, leading to oxidative DNA damage, premature aging, neurodegenerative diseases, and cancer. The transcription factor nuclear factor (erythroid-derived 2)-like 2 (Nrf2) activates many cytoprotective proteins within the nucleus to maintain homeostasis during oxidative stress. In vertebrates, Nrf2 levels are regulated by the Kelch-family protein Keap1 (Kelch-like ECH-associated protein 1) in the absence of stress according to a canonical redox control pathway. Little, however, is known about the redox control pathway used in early diverging metazoans. Our study examines the presence of known oxidative stress regulatory elements within non-bilaterian metazoans including free living and parasitic cnidarians, ctenophores, placozoans, and sponges. Cnidarians, with their pivotal position as the sister phylum to bilaterians, play an important role in understanding the evolutionary history of response to oxidative stress. Through comparative genomic and transcriptomic analysis our results show that Nrf homologs evolved early in metazoans, whereas Keap1 appeared later in the last common ancestor of cnidarians and bilaterians. However, key Nrf–Keap1 interacting domains are not conserved within the cnidarian lineage, suggesting this important pathway evolved with the radiation of bilaterians. Several known downstream Nrf targets are present in cnidarians suggesting that cnidarian Nrf plays an important role in oxidative stress response even in the absence of Keap1. Comparative analyses of key oxidative stress sensing and response proteins in early diverging metazoans thus provide important insights into the molecular basis of how these lineages interact with their environment and suggest a shared evolutionary history of regulatory pathways. Exploration of these pathways may prove important for the study of cancer therapeutics and broader research in oxidative stress, senescence, and the functional responses of early diverging metazoans to environmental change. 
    more » « less