skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO)
Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.  more » « less
Award ID(s):
1754097
PAR ID:
10326289
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Editor(s):
Ouellette, Francis
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
10
ISSN:
1553-7358
Page Range / eLocation ID:
e1009463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations. 
    more » « less
  2. The Evidence & Conclusion Ontology (ECO) is a community standard for summarizing evidence in scientific research in a controlled, structured way. Annotations at the world's most frequented biological databases (e.g. model organisms, UniProt, Gene Ontology) are supported using ECO terms. ECO describes evidence derived from experimental and computational methods, author statements curated from the literature, inferences drawn by curators, and other types of evidence. Here, we describe recent ECO developments and collaborations, most notably: (i) a new ECO website containing user documentation, up-to-date news, and visualization tools; (ii) improvements to the ontology structure; (iii) implementing logic via an ongoing collaboration with the Ontology for Biomedical Investigations (OBI); (iv) addition of numerous experimental evidence types; and (v) addition of new evidence classes describing computationally derived evidence. Due to its utility, popularity, and simplicity, ECO is now expanding into realms beyond the protein annotation community, for example the biodiversity and phenotype communities. As ECO continues to grow as a resource, we are seeking new users and new use cases, with the hope that ECO will continue to be a broadly used and easy-to-implement community standard for representing evidence in diverse biological applications. Feel free to visit two ECO-sponsored workshops at ICBO 2016 to learn more: 1. “An introduction to the Evidence and Conclusion Ontology and representing evidence in scientific research” and 2. “OBI-ECO Interactions & Evidence”. 
    more » « less
  3. The Evidence & Conclusion Ontology (ECO) is a community standard for summarizing evidence in scientific research in a controlled, structured way. Annotations at the world's most frequented biological databases (e.g. model organisms, UniProt, Gene Ontology) are supported using ECO terms. ECO describes evidence derived from experimental and computational methods, author statements curated from the literature, inferences drawn by curators, and other types of evidence. Here, we describe recent ECO developments and collaborations, most notably: (i) a new ECO website containing user documentation, up-to-date news, and visualization tools; (ii) improvements to the ontology structure; (iii) implementing logic via an ongoing collaboration with the Ontology for Biomedical Investigations (OBI); (iv) addition of numerous experimental evidence types; and (v) addition of new evidence classes describing computationally derived evidence. Due to its utility, popularity, and simplicity, ECO is now expanding into realms beyond the protein annotation community, for example the biodiversity and phenotype communities. As ECO continues to grow as a resource, we are seeking new users and new use cases, with the hope that ECO will continue to be a broadly used and easy-to-implement community standard for representing evidence in diverse biological applications. Feel free to visit two ECO-sponsored workshops at ICBO 2016 to learn more: 1. “An introduction to the Evidence and Conclusion Ontology and representing evidence in scientific research” and 2. “OBI-ECO Interactions & Evidence”. 
    more » « less
  4. The Evidence & Conclusion Ontology (ECO) is a community standard for summarizing evidence in scientific research in a controlled, structured way. Annotations at the world's most frequented biological databases (e.g. model organisms, UniProt, Gene Ontology) are supported using ECO terms. ECO describes evidence derived from experimental and computational methods, author statements curated from the literature, inferences drawn by curators, and other types of evidence. Here, we describe recent ECO developments and collaborations, most notably: (i) a new ECO website containing user documentation, up-to-date news, and visualization tools; (ii) improvements to the ontology structure; (iii) implementing logic via an ongoing collaboration with the Ontology for Biomedical Investigations (OBI); (iv) addition of numerous experimental evidence types; and (v) addition of new evidence classes describing computationally derived evidence. Due to its utility, popularity, and simplicity, ECO is now expanding into realms beyond the protein annotation community, for example the biodiversity and phenotype communities. As ECO continues to grow as a resource, we are seeking new users and new use cases, with the hope that ECO will continue to be a broadly used and easy-to-implement community standard for representing evidence in diverse biological applications. Feel free to visit two ECO-sponsored workshops at ICBO 2016 to learn more: 1. “An introduction to the Evidence and Conclusion Ontology and representing evidence in scientific research” and 2. “OBI-ECO Interactions & Evidence”. 
    more » « less
  5. Abstract Premise The functional annotation of genes is a crucial component of genomic analyses. A common way to summarize functional annotations is with hierarchical gene ontologies, such as the Gene Ontology (GO) Resource. GO includes information about the cellular location, molecular function(s), and products/processes that genes produce or are involved in. For a set of genes, summarizing GO annotations using pre‐defined, higher‐order terms (GO slims) is often desirable in order to characterize the overall function of the data set, and it is impractical to do this manually. Methods and Results The GOgetter pipeline consists of bash and Python scripts. From an input FASTA file of nucleotide gene sequences, it outputs text and image files that list (1) the best hit for each input gene in a set of reference gene models, (2) all GO terms and annotations associated with those hits, and (3) a summary and visualization of GO slim categories for the data set. These output files can be queried further and analyzed statistically, depending on the downstream need(s). Conclusions GO annotations are a widely used “universal language” for describing gene functions and products. GOgetter is a fast and easy‐to‐implement pipeline for obtaining, summarizing, and visualizing GO slim categories associated with a set of genes. 
    more » « less