skip to main content


Title: UniProt: the Universal Protein Knowledgebase in 2023
Abstract

The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.

 
more » « less
Award ID(s):
1917302
NSF-PAR ID:
10475442
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Corporate Creator(s):
Publisher / Repository:
Oxford Academic
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
51
Issue:
D1
ISSN:
0305-1048
Page Range / eLocation ID:
D523 to D531
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/. 
    more » « less
  2. The Evidence & Conclusion Ontology (ECO) is a community standard for summarizing evidence in scientific research in a controlled, structured way. Annotations at the world's most frequented biological databases (e.g. model organisms, UniProt, Gene Ontology) are supported using ECO terms. ECO describes evidence derived from experimental and computational methods, author statements curated from the literature, inferences drawn by curators, and other types of evidence. Here, we describe recent ECO developments and collaborations, most notably: (i) a new ECO website containing user documentation, up-to-date news, and visualization tools; (ii) improvements to the ontology structure; (iii) implementing logic via an ongoing collaboration with the Ontology for Biomedical Investigations (OBI); (iv) addition of numerous experimental evidence types; and (v) addition of new evidence classes describing computationally derived evidence. Due to its utility, popularity, and simplicity, ECO is now expanding into realms beyond the protein annotation community, for example the biodiversity and phenotype communities. As ECO continues to grow as a resource, we are seeking new users and new use cases, with the hope that ECO will continue to be a broadly used and easy-to-implement community standard for representing evidence in diverse biological applications. Feel free to visit two ECO-sponsored workshops at ICBO 2016 to learn more: 1. “An introduction to the Evidence and Conclusion Ontology and representing evidence in scientific research” and 2. “OBI-ECO Interactions & Evidence”. 
    more » « less
  3. The Evidence & Conclusion Ontology (ECO) is a community standard for summarizing evidence in scientific research in a controlled, structured way. Annotations at the world's most frequented biological databases (e.g. model organisms, UniProt, Gene Ontology) are supported using ECO terms. ECO describes evidence derived from experimental and computational methods, author statements curated from the literature, inferences drawn by curators, and other types of evidence. Here, we describe recent ECO developments and collaborations, most notably: (i) a new ECO website containing user documentation, up-to-date news, and visualization tools; (ii) improvements to the ontology structure; (iii) implementing logic via an ongoing collaboration with the Ontology for Biomedical Investigations (OBI); (iv) addition of numerous experimental evidence types; and (v) addition of new evidence classes describing computationally derived evidence. Due to its utility, popularity, and simplicity, ECO is now expanding into realms beyond the protein annotation community, for example the biodiversity and phenotype communities. As ECO continues to grow as a resource, we are seeking new users and new use cases, with the hope that ECO will continue to be a broadly used and easy-to-implement community standard for representing evidence in diverse biological applications. Feel free to visit two ECO-sponsored workshops at ICBO 2016 to learn more: 1. “An introduction to the Evidence and Conclusion Ontology and representing evidence in scientific research” and 2. “OBI-ECO Interactions & Evidence”. 
    more » « less
  4. The Evidence & Conclusion Ontology (ECO) is a community standard for summarizing evidence in scientific research in a controlled, structured way. Annotations at the world's most frequented biological databases (e.g. model organisms, UniProt, Gene Ontology) are supported using ECO terms. ECO describes evidence derived from experimental and computational methods, author statements curated from the literature, inferences drawn by curators, and other types of evidence. Here, we describe recent ECO developments and collaborations, most notably: (i) a new ECO website containing user documentation, up-to-date news, and visualization tools; (ii) improvements to the ontology structure; (iii) implementing logic via an ongoing collaboration with the Ontology for Biomedical Investigations (OBI); (iv) addition of numerous experimental evidence types; and (v) addition of new evidence classes describing computationally derived evidence. Due to its utility, popularity, and simplicity, ECO is now expanding into realms beyond the protein annotation community, for example the biodiversity and phenotype communities. As ECO continues to grow as a resource, we are seeking new users and new use cases, with the hope that ECO will continue to be a broadly used and easy-to-implement community standard for representing evidence in diverse biological applications. Feel free to visit two ECO-sponsored workshops at ICBO 2016 to learn more: 1. “An introduction to the Evidence and Conclusion Ontology and representing evidence in scientific research” and 2. “OBI-ECO Interactions & Evidence”. 
    more » « less
  5. Abstract

    More than 61,000 proteins have up-to-date correspondence between their amino acid sequence (UniProtKB) and their 3D structures (PDB), enabled by the Structure Integration with Function, Taxonomy and Sequences (SIFTS) resource. SIFTS incorporates residue-level annotations from many other biological resources. SIFTS data is available in various formats like XML, CSV and TSV format or also accessible via the PDBe REST API but always maintained separately from the structure data (PDBx/mmCIF file) in the PDB archive. Here, we extended the wwPDB PDBx/mmCIF data dictionary with additional categories to accommodate SIFTS data and added the UniProtKB, Pfam, SCOP2, and CATH residue-level annotations directly into the PDBx/mmCIF files from the PDB archive. With the integrated UniProtKB annotations, these files now provide consistent numbering of residues in different PDB entries allowing easy comparison of structure models. The extended dictionary yields a more consistent, standardised metadata description without altering the core PDB information. This development enables up-to-date cross-reference information at the residue level resulting in better data interoperability, supporting improved data analysis and visualisation.

     
    more » « less