NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

InterPro in 2022

https://doi.org/10.1093/nar/gkac993

Paysan-Lafosse, Typhaine; Blum, Matthias; Chuguransky, Sara; Grego, Tiago; Pinto, Beatriz Lázaro; Salazar, Gustavo A; Bileschi, Maxwell L; Bork, Peer; Bridge, Alan; Colwell, Lucy; et al (November 2022, Nucleic Acids Research)

Abstract The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.
more » « less
Full Text Available
UniProt: the Universal Protein Knowledgebase in 2023

https://doi.org/10.1093/nar/gkac1052

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bye-A-Jee, Hema; Cukura, Austra; et al (November 2022, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
more » « less
Full Text Available
The Quest for Orthologs orthology benchmark service in 2022

https://doi.org/10.1093/nar/gkac330

Nevers, Yannis; Jones, Tamsin E. M.; Jyothi, Dushyanth; Yates, Bethan; Ferret, Meritxell; Portell-Silva, Laura; Codo, Laia; Cosentino, Salvatore; Marcet-Houben, Marina; Vlasova, Anna; et al (May 2022, Nucleic Acids Research)

Abstract The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.
more » « less
PANTHER : Making genome‐scale phylogenetics accessible to all

https://doi.org/10.1002/pro.4218

Thomas, Paul D.; Ebert, Dustin; Muruganujan, Anushya; Mushayahama, Tremayne; Albou, Laurent‐Philippe; Mi, Huaiyu (January 2022, Protein Science)

Full Text Available
Ten Years of Collaborative Progress in the Quest for Orthologs

https://doi.org/10.1093/molbev/msab098

Linard, Benjamin; Ebersberger, Ingo; McGlynn, Shawn E; Glover, Natasha; Mochizuki, Tomohiro; Patricio, Mateus; Lecompte, Odile; Nevers, Yannis; Thomas, Paul D; Gabaldón, Toni; et al (April 2021, Molecular Biology and Evolution)

Abstract Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology—evolutionary relatedness—is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit—from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
more » « less
Full Text Available
PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API

https://doi.org/10.1093/nar/gkaa1106

Mi, Huaiyu; Ebert, Dustin; Muruganujan, Anushya; Mills, Caitlin; Albou, Laurent-Philippe; Mushayamaha, Tremayne; Thomas, Paul D (December 2020, Nucleic Acids Research)
null (Ed.)
Abstract PANTHER (Protein Analysis Through Evolutionary Relationships, http://www.pantherdb.org) is a resource for the evolutionary and functional classification of protein-coding genes from all domains of life. The evolutionary classification is based on a library of over 15,000 phylogenetic trees, and the functional classifications include Gene Ontology terms and pathways. Here, we analyze the current coverage of genes from genomes in different taxonomic groups, so that users can better understand what to expect when analyzing a gene list using PANTHER tools. We also describe extensive improvements to PANTHER made in the past two years. The PANTHER Protein Class ontology has been completely refactored, and 6101 PANTHER families have been manually assigned to a Protein Class, providing a high level classification of protein families and their genes. Users can access the TreeGrafter tool to add their own protein sequences to the reference phylogenetic trees in PANTHER, to infer evolutionary context as well as fine-grained annotations. We have added human enhancer-gene links that associate non-coding regions with the annotated human genes in PANTHER. We have also expanded the available services for programmatic access to PANTHER tools and data via application programming interfaces (APIs). Other improvements include additional plant genomes and an updated PANTHER GO-slim.
more » « less
Full Text Available
UniProt: the universal protein knowledgebase in 2021

https://doi.org/10.1093/nar/gkaa1100

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Agivetova, Rahat; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bursteinas, Borisas; et al (November 2020, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
more » « less
Full Text Available
The Quest for Orthologs benchmark service and consensus calls in 2020

https://doi.org/10.1093/nar/gkaa308

Altenhoff, Adrian M; Garrayo-Ventas, Javier; Cosentino, Salvatore; Emms, David; Glover, Natasha M; Hernández-Plaza, Ana; Nevers, Yannis; Sundesha, Vicky; Szklarczyk, Damian; Fernández, José M; et al (May 2020, Nucleic Acids Research)

Abstract The identification of orthologs—genes in different species which descended from the same gene in their last common ancestor—is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.
more » « less
Full Text Available

Search for: All records