skip to main content


Title: Arabidopsis bioinformatics resources: The current state, challenges, and priorities for the future
Abstract

Effective research, education, and outreach efforts by theArabidopsis thalianacommunity, as well as other scientific communities that depend on Arabidopsis resources, depend vitally on easily available and publicly‐shared resources. These resources include reference genome sequence data and an ever‐increasing number of diverse data sets and data types.TAIR(The Arabidopsis Information Resource) and Araport (originally named the Arabidopsis Information Portal) are community informatics resources that provide tools, data, and applications to the more than 30,000 researchers worldwide that use in their work either Arabidopsis as a primary system of study or data derived from Arabidopsis. Four years after Araport's establishment, theIAICheld another workshop to evaluate the current status of Arabidopsis Informatics and chart a course for future research and development. The workshop focused on several challenges, including the need for reliable and current annotation, community‐defined common standards for data and metadata, and accessible and user‐friendly repositories/tools/methods for data integration and visualization. Solutions envisioned included (a) a centralized annotation authority to coalesce annotation from new groups, establish a consistent naming scheme, distribute this format regularly and frequently, and encourage and enforce its adoption. (b) Standards for data and metadata formats, which are essential, but challenging when comparing across diverse genotypes and in areas with less‐established standards (e.g., phenomics, metabolomics). Community‐established guidelines need to be developed. (c) A searchable, central repository for analysis and visualization tools. Improved versioning and user access would make tools more accessible. Workshop participants proposed a “one‐stop shop” website, an Arabidopsis “Super‐Portal” to link tools, data resources, programmatic standards, and best practice descriptions for each data type. This must have community buy‐in and participation in its establishment and development to encourage adoption.

 
more » « less
NSF-PAR ID:
10082774
Author(s) / Creator(s):
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Plant Direct
Volume:
3
Issue:
1
ISSN:
2475-4455
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. . Granting agencies invest millions of dollars on the generation and analysis of data, making these products extremely valuable. However, without sufficient annotation of the methods used to collect and analyze the data, the ability to reproduce and reuse those products suffers. This lack of assurance of the quality and credibility of the data at the different stages in the research process essentially wastes much of the investment of time and funding and fails to drive research forward to the level of potential possible if everything was effectively annotated and disseminated to the wider research community. In order to address this issue for the Hawai'i Established Program to Stimulate Competitive Research (EPSCoR) project, a water science gateway was developed at the University of Hawai‘i (UH), called the ‘Ike Wai Gateway. In Hawaiian, ‘Ike means knowledge and Wai means water. The gateway supports research in hydrology and water management by providing tools to address questions of water sustainability in Hawai‘i. The gateway provides a framework for data acquisition, analysis, model integration, and display of data products. The gateway is intended to complement and integrate with the capabilities of the Consortium of Universities for the Advancement of Hydrologic Science's (CUAHSI) Hydroshare by providing sound data and metadata management capabilities for multi-domain field observations, analytical lab actions, and modeling outputs. Functionality provided by the gateway is supported by a subset of the CUAHSI’s Observations Data Model (ODM) delivered as centralized web based user interfaces and APIs supporting multi-domain data management, computation, analysis, and visualization tools to support reproducible science, modeling, data discovery, and decision support for the Hawai'i EPSCoR ‘Ike Wai research team and wider Hawai‘i hydrology community. By leveraging the Tapis platform, UH has constructed a gateway that ties data and advanced computing resources together to support diverse research domains including microbiology, geochemistry, geophysics, economics, and humanities, coupled with computational and modeling workflows delivered in a user friendly web interface with workflows for effectively annotating the project data and products. Disseminating results for the ‘Ike Wai project through the ‘Ike Wai data gateway and Hydroshare makes the research products accessible and reusable. 
    more » « less
  2. Abstract

    A key remit of theNSF‐funded “Arabidopsis Research and Training for the 21stCentury” (ART‐21) Research Coordination Network has been to convene a series of workshops with community members to explore issues concerning research and training in plant biology, including the role that research usingArabidopsis thalianacan play in addressing those issues. A first workshop focused on training needs for bioinformatic and computational approaches in plant biology was held in 2016, and recommendations from that workshop have been published (Friesner et al.,Plant Physiology, 175, 2017, 1499). In this white paper, we provide a summary of the discussions and insights arising from the secondART‐21 workshop. The second workshop focused on experimental aspects of omics data acquisition and analysis and involved a broad spectrum of participants from academics and industry, ranging from graduate students through post‐doctorates, early career and established investigators. Our hope is that this article will inspire beginning and established scientists, corporations, and funding agencies to pursue directions in research and training identified by this workshop, capitalizing on the reference speciesArabidopsis thalianaand other valuable plant systems.

     
    more » « less
  3. Abstract

    Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021–22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals.

    Database URL: https://www.agbiodata.org.

     
    more » « less
  4. A series of international workshops held in 2014, 2017, 2019, and 2022 focused on improving tephra studies from field collection through publication and encouraging FAIR (findable, accessible, interoperable, reusable) data practices for tephra data and metadata. Two consensus needs for tephra studies emerged from the 2014 and 2017 workshops: (a) standardization of tephra field data collection, geochemical analysis, correlation, and data reporting, and (b) development of next generation computer tools and databases to facilitate information access across multidisciplinary communities. To achieve (a), we developed a series of recommendations for best practices in tephra studies, from sample collection through analysis and data reporting (https://zenodo.org/record/3866266). A 4-part virtual workshop series (https://tephrochronology.org/cot/Tephra2022/) was held in February and March, 2022, to update the tephra community on these developments, to get community feedback, to learn of unmet needs, and to plan a future roadmap for open and FAIR tephra data. More than 230 people from 25 nations registered for the workshop series. The community strongly emphasized the need for better computer systems, including physical infrastructure (repositories and servers), digital infrastructure (software and tools) and human infrastructure (people, training, and professional assistance), to store, manage and serve global tephra datasets. Some desired attributes of improved computer systems include: 1) user friendliness 2) ability to easily ingest multiparameter tephra data (using best practice recommended data fields); 3) interoperability with existing data repositories; 4) development of tool add-ons (plotting and statistics); 5) improved searchability 6) development of a tephra portal with access to distributed data systems, and 7) commitments to long-term support from funding agencies, publishers and the cyberinfrastructure community. 
    more » « less
  5. Abstract

    Marine protected area (MPA) networks, with varying degrees of protection and use, can be useful tools to achieve both conservation and fisheries management benefits. Assessing whetherMPAnetworks meet their objectives requires data from Before the establishment of the network to better discern natural spatiotemporal variation and preexisting differences from the response to protection. Here, we use a Progressive‐ChangeBACIPSapproach to assess the ecological effects of a network of five fully and three moderately protectedMPAs on fish communities in two coral reef habitats (lagoon and fore reef) based on a time series of data collected five times (over three years) Before and 12 times (over nine years) After the network's establishment on the island of Moorea, French Polynesia. At the network scale, on the fore reef, density and biomass of harvested fishes increased by 19.3% and 24.8%, respectively, in protected areas relative to control fished areas. Fully protected areas provided greater ecological benefits than moderately protected areas. In the lagoon, density and biomass of harvested fishes increased, but only the 31% increase in biomass in fully protectedMPAs was significant. Non‐harvested fishes did not respond to protection in any of the habitats. We propose that these responses to protection were small, relative to otherMPAassessments, due to limited compliance and weak surveillance, although other factors such as the occurrence of a crown‐of‐thorns starfish outbreak and a cyclone after the network was established may also have impeded the ability of the network to provide benefits. Our results highlight the importance of using fully protectedMPAs over moderately protectedMPAs to achieve conservation objectives, even in complex social–ecological settings, but also stress the need to monitor effects and adapt management based on ongoing assessments.

     
    more » « less