skip to main content

Title: Expansion of the Gene Ontology knowledgebase and resources
The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
Nucleic Acids Research
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The Gene Ontology (GO) knowledgebase ( is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO—a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations—evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)—mechanistic models of molecular “pathways” (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.

    more » « less
  2. Abstract

    Echinobase ( is a third generation web resource supporting genomic research on echinoderms. The new version was built by cloning the mature Xenopus model organism knowledgebase, Xenbase, refactoring data ingestion pipelines and modifying the user interface to adapt to multispecies echinoderm content. This approach leveraged over 15 years of previous database and web application development to generate a new fully featured informatics resource in a single year. In addition to the software stack, Echinobase uses the private cloud and physical hosts that support Xenbase. Echinobase currently supports six echinoderm species, focused on those used for genomics, developmental biology and gene regulatory network analyses. Over 38 000 gene pages, 18 000 publications, new improved genome assemblies, JBrowse genome browser and BLAST + services are available and supported by the development of a new echinoderm anatomical ontology, uniformly applied formal gene nomenclature, and consistent orthology predictions. A novel feature of Echinobase is integrating support for multiple, disparate species. New genomes from the diverse echinoderm phylum will be added and supported as data becomes available. The common code development design of the integrated knowledgebases ensures parallel improvements as each resource evolves. This approach is widely applicable for developing new model organism informatics resources.

    more » « less
  3. Ouellette, Francis (Ed.)
    Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills. 
    more » « less
  4. Abstract

    The Planteome project ( provides a suite of reference and crop-specific ontologies and an integrated knowledgebase of plant genomics data. The plant genomics data in the Planteome has been obtained through manual and automated curation and sourced from more than 40 partner databases and resources. Here, we report on updates to the Planteome reference ontologies, namely, the Plant Ontology (PO), Trait Ontology (TO), the Plant Experimental Conditions Ontology (PECO), and integration of species/crop-specific vocabularies from our partners, the Crop Ontology (CO) into the TO ontology graph. Currently, 11 CO vocabularies are integrated into the Planteome with the addition of yam, sorghum, and potato since 2018. In addition, the size of the annotation database has increased by 34%, and the number of bioentities (genes, proteins, etc.) from 125 plant taxa has increased by 72%. We developed new tools to facilitate user requests and improvements to the CO vocabularies, and to allow fast searching and browsing of PO terms and definitions. These enhancements and future changes to automate the TO-CO mappings and knowledge discovery tools ensure that the Planteome will continue to be a valuable resource for plant biology.

    more » « less
  5. Wood, V (Ed.)
    Abstract The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein–protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides. 
    more » « less