Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences

Deng, Cecilia H; Naithani, Sushma; Kumari, Sunita; Cobo-Simón, Irene; Quezada-Rodríguez, Elsa H; Skrabisova, Maria; Gladman, Nick; Correll, Melanie J; Sikiru, Akeem Babatunde; Afuwape, Olusola O; Marrano, Annarita; Rebollo, Ines; Zhang, Wentao; Jung, Sook

doi:10.1093/database/baad088

Citation Details

Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences

Abstract Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021–22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org. more »

Award ID(s):: 2126334

PAR ID:: 10486308

Author(s) / Creator(s):: Deng, Cecilia H; Naithani, Sushma; Kumari, Sunita; Cobo-Simón, Irene; Quezada-Rodríguez, Elsa H; Skrabisova, Maria; Gladman, Nick; Correll, Melanie J; Sikiru, Akeem Babatunde; Afuwape, Olusola O; Marrano, Annarita; Rebollo, Ines; Zhang, Wentao; Jung, Sook

Publisher / Repository:: Oxford: International Society for Biocuration

Date Published:: 2023-01-01

Journal Name:: Database

Volume:: 2023

ISSN:: 1758-0463

Subject(s) / Keyword(s):: phenotype data genotype data FAIR data ontologies agriculture databases

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1093/database/baad088

More Like this