NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The need for robust, FAIR phenomic databases supporting agricultural efficiency and resiliency

https://doi.org/10.1093/scipol/scaf039

Callwood, Jodi; Celebioglu, Burcu; Gladman, Nicholas; Jung, Jinha; Lachowiec, Jennifer; Quezada_Rodriguez, Elsa H; McNamara, John P; Clarke, Jennifer (August 2025, Science and Public Policy)

Abstract The US agriculture and food systems research and education system remains the envy of the world, and the US Department of Agriculture and the Land-Grant University system lead the public and private partnerships that have improved agricultural productivity and human health phenomenally for over 160 years. The continuation of these improvements relies on equitable access to trustworthy data—particularly in genetics and phenomics—and the ability to leverage such data to address future scientific challenges. In this article, we discuss the growing need in agriculture for phenomic databases that follow findable, accessible, interoperable, and reproducible data (FAIR) guidelines, as well as the need for public policy supporting a sustainable funding model for these databases.
more » « less
Free, publicly-accessible full text available August 25, 2026
A teaching and training framework to promote findable, accessible, interoperable, and reusable data generation in agriculture

https://doi.org/10.1093/database/baaf034

Marrano, Annarita; Cabugos, Leyla; Hafner, Alenka; Kapoor, Beant; McNamara, John; O’Donnell, Megan; Reiser, Leonore; Tello-Ruiz, Marcela_Karey; Zhang, Huiting; Staton, Margaret (April 2025, Database)

Abstract Advances in agricultural genetic, genomic, and breeding (GGB) technologies generate increasingly large and complex datasets that need to be adequately managed and shared. While several agricultural biological databases maintain and curate GGB data, not all scientists are aware of them and how they can be used to access and share data. In addition, there is the need to increase scientists’ awareness that appropriate data archiving and curation increases data longevity and value and bolsters scientific discoveries’ reproducibility and transparency. The AgBioData Education working group aims to address these unmet needs and developed a modular curriculum for educators teaching the basics of biological databases and the findable, accessible, interoperable, and reusable (FAIR) principles to undergraduate and graduate students (https://www.agbiodata.org/). The present paper provides an overview of the topics covered within the curriculum, called ‘AgBioData Curriculum for Ag FAIR Data,’ its audience and modalities, and how it will positively impact all the different stakeholders of the agricultural database ecosystem. We hope the modular curriculum presented here can help scientists and students understand and support database use in all aspects of improving our global food system. Database URL: https://zenodo.org/records/14278084
more » « less
Data reuse in agricultural genomics research: challenges and recommendations

https://doi.org/10.1093/gigascience/giae106

Hafner, Alenka; DeLeo, Victoria; Deng, Cecilia_H; Elsik, Christine_G; S Fleming, Damarius; Harrison, Peter_W; Kalbfleisch, Theodore_S; Petry, Bruna; Pucker, Boas; Quezada-Rodríguez, Elsa_H; et al (January 2025, GigaScience)

Abstract The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.
more » « less
Guidelines for gene and genome assembly nomenclature

https://doi.org/10.1093/genetics/iyaf006

Cannon, Ethalinda KS; Molik, David C; Wright, Adam J; Zhang, Huiting; Honaas, Loren; Chougule, Kapeel; Dyer, Sarah (January 2025, GENETICS)
Harris, T (Ed.)
Abstract The rapid increase in the number of reference-quality genome assemblies presents significant new opportunities for genomic research. However, the absence of standardized naming conventions for genome assemblies and annotations across datasets creates substantial challenges. Inconsistent naming hinders the identification of correct assemblies, complicates the integration of bioinformatics pipelines, and makes it difficult to link assemblies across multiple resources. To address this, we developed a specification for standardizing the naming of reference genome assemblies, to improve consistency across datasets and facilitate interoperability. This specification was created with FAIR (Findable, Accessible, Interoperable, and Reusable) practices in mind, ensuring that reference assemblies are easier to locate, access, and reuse across research communities. Additionally, it has been designed to comply with primary genomic data repositories, including members of the International Nucleotide Sequence Database Collaboration consortium, ensuring compatibility with widely used databases. While initially tailored to the agricultural genomics community, the specification is adaptable for use across different taxa. Widespread adoption of this standardized nomenclature would streamline assembly management, better enable cross-species analyses, and improve the reproducibility of research. It would also enhance natural language processing applications that depend on consistent reference assembly names in genomic literature, promoting greater integration and automated analysis of genomic data. This is a good time to consider more consistent genomic data nomenclature as many research communities and data resources are now finding themselves juggling multiple datasets from multiple data providers.
more » « less
Full Text Available
Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences

https://doi.org/10.1093/database/baad088

Deng, Cecilia H; Naithani, Sushma; Kumari, Sunita; Cobo-Simón, Irene; Quezada-Rodríguez, Elsa H; Skrabisova, Maria; Gladman, Nick; Correll, Melanie J; Sikiru, Akeem Babatunde; Afuwape, Olusola O; et al (January 2023, Database)

Abstract Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021–22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
more » « less
Full Text Available
Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium

https://doi.org/10.1093/database/baad076

Clarke, Jennifer L; Cooper, Laurel D; Poelchau, Monica F; Berardini, Tanya Z; Elser, Justin; Farmer, Andrew D; Ficklin, Stephen; Kumari, Sunita; Laporte, Marie-Angélique; Nelson, Rex T; et al (January 2023, Database)

Abstract Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as ‘databases’ throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases
more » « less
Full Text Available
Data Reuse in Agricultural Genomics Research: Present Challenges and Future Solutions

https://doi.org/10.20944/preprints202401.0780.v1

Hafner, Alenka; DeLeo, Victoria; Deng, Cecilia; Elsik, Christine G; Fleming, Damarius; Harrison, Peter W; Kalbfleisch, Theodore S; Petry, Bruna; Pucker, Boas; Quezada-Rodríguez, Elsa H; et al (January 2024, preprints.org)

The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium (https://www.agbiodata.org/) framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.
more » « less
Full Text Available

Search for: All records