skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Data Reuse in Agricultural Genomics Research: Present Challenges and Future Solutions
The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium (https://www.agbiodata.org/) framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.  more » « less
Award ID(s):
2126334
PAR ID:
10533146
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
preprints.org
Date Published:
Format(s):
Medium: X
Institution:
Preprints.org
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse. 
    more » « less
  2. Harris, T (Ed.)
    Abstract The rapid increase in the number of reference-quality genome assemblies presents significant new opportunities for genomic research. However, the absence of standardized naming conventions for genome assemblies and annotations across datasets creates substantial challenges. Inconsistent naming hinders the identification of correct assemblies, complicates the integration of bioinformatics pipelines, and makes it difficult to link assemblies across multiple resources. To address this, we developed a specification for standardizing the naming of reference genome assemblies, to improve consistency across datasets and facilitate interoperability. This specification was created with FAIR (Findable, Accessible, Interoperable, and Reusable) practices in mind, ensuring that reference assemblies are easier to locate, access, and reuse across research communities. Additionally, it has been designed to comply with primary genomic data repositories, including members of the International Nucleotide Sequence Database Collaboration consortium, ensuring compatibility with widely used databases. While initially tailored to the agricultural genomics community, the specification is adaptable for use across different taxa. Widespread adoption of this standardized nomenclature would streamline assembly management, better enable cross-species analyses, and improve the reproducibility of research. It would also enhance natural language processing applications that depend on consistent reference assembly names in genomic literature, promoting greater integration and automated analysis of genomic data. This is a good time to consider more consistent genomic data nomenclature as many research communities and data resources are now finding themselves juggling multiple datasets from multiple data providers. 
    more » « less
  3. The opaque relationship between biology and behavior is an intractable problem for psychiatry, and it increasingly challenges longstanding diagnostic categorizations. While various big data sciences have been repeatedly deployed as potential solutions, they have so far complicated more than they have managed to disentangle. Attending to categorical misalignment, this article proposes one reason why this is the case: Datasets have to instantiate clinical categories in order to make biological sense of them, and they do so in different ways. Here, I use mixed methods to examine the role of the reuse of big data in recent genomic research on autism spectrum disorder (ASD). I show how divergent regimes of psychiatric categorization are innately encoded within commonly used datasets from MSSNG and 23andMe, contributing to a rippling disjuncture in the accounts of autism that this body of research has produced. Beyond the specific complications this dynamic introduces for the category of autism, this paper argues for the necessity of critical attention to the role of dataset reuse and recombination across human genomics and beyond. 
    more » « less
  4. Oceanography is inherently an interdisciplinary science capable of producing highly complex, heterogeneous data that pose unique challenges for data management and reuse. Evolving instrumentation and new research methodologies are increasingly taxing current strategies and technologies for management and reuse of data. Data-related publisher and funder requirements are relatively new demands that researchers must learn to navigate. These are just some of the stressors that repositories experience in their role of curating and publishing FAIR marine-related data. In response, oceanographic repositories are adapting by leveraging community data standards, engaging in the development of new technologies and the usage of novel tools to improve data discovery and interoperability. Additionally, they are collaborating with data-related stakeholders to help shape data-related policy, and fill an education role to promote good data hygiene and bring awareness of concepts like FAIR to the oceanographic research community. This presentation will highlight some of the activities of the BCO-DMO repository that are aimed at advancing the availability and reuse of Open oceanographic data. 
    more » « less
  5. Aridification in the U.S. Southwest has led to tension about conservation and land management strategy. Strain on multi-generational agricultural livelihoods and nearly 150-year-old Colorado River water adjudication necessitates solutions from transdisciplinary partnerships. In this study, farmers and ranchers in a small San Juan River headwater community of southwestern Colorado engaged in a participatory, convergent research study prioritizing local objectives and policy. Acknowledging the historic and sometimes perceived role of academic institutions as representing urban interests, our goal was to highlight how research can support rural governance. This process involved creating community partnerships, analyzing data, and supporting results distribution to the surveyed population through social media. The survey was designed to support a local waterway management plan. Survey results showed lack of water availability and climate changes were selected by producers as most negatively affecting their operations, and many were extremely interested in agroforestry methods and drought-resistant crop species. Statistical analysis identified that satisfaction with community resources was positively correlated with scale of production, satisfaction with irrigation equipment, and familiarity with water rights. We hope to contribute our framework of a convergent, place-based research design for wider applications in other regions to uncover solutions to resource challenges. 
    more » « less