Genomic data are being produced and archived at a prodigious rate, and current studies could become historical baselines for future global genetic diversity analyses and monitoring programs. However, when we evaluated the potential utility of genomic data from wild and domesticated eukaryote species in the world’s largest genomic data repository, we found that most archived genomic datasets (87%) lacked the spatiotemporal metadata necessary for genetic biodiversity surveillance. Labor-intensive scouring of a subset of published papers yielded geospatial coordinates and collection years for only 39% (51% if place names were considered) of these genomic datasets. Streamlined data input processes, updated metadata deposition policies, and enhanced scientific community awareness are urgently needed to preserve these irreplaceable records of today’s genetic biodiversity and to plug the growing metadata gap.
more »
« less
Importance of timely metadata curation to the global surveillance of genetic diversity
Abstract Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome‐scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous peoples. We undertook a distributed datathon to quantify the availability of these missing metadata and to test the hypothesis that their availability decays with time. We also worked to remediate missing metadata by extracting them from associated published papers, online repositories, and direct communication with authors. Starting with 848 candidate genomic data sets (reduced representation and whole genome) from the International Nucleotide Sequence Database Collaboration, we determined that 561 contained mostly samples from wild populations. We successfully restored spatiotemporal metadata for 78% of these 561 data sets (n = 440 data sets with data on 45,105 individuals from 762 species in 17 phyla). Examining papers and online repositories was much more fruitful than contacting 351 authors, who replied to our email requests 45% of the time. Overall, 23% of our email queries to authors unearthed useful metadata. The probability of retrieving spatiotemporal metadata declined significantly as age of the data set increased. There was a 13.5% yearly decrease in metadata associated with published papers or online repositories and up to a 22% yearly decrease in metadata that were only available from authors. This rapid decay in metadata availability, mirrored in studies of other types of biological data, should motivate swift updates to data‐sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost to conservation science forever.
more »
« less
- Award ID(s):
- 1764316
- PAR ID:
- 10401184
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Conservation Biology
- Volume:
- 37
- Issue:
- 4
- ISSN:
- 0888-8892
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Understanding how genetic diversity is distributed across spatiotemporal scales in species of conservation or management concern is critical for identifying large‐scale mechanisms affecting local conservation status and implementing large‐scale biodiversity monitoring programmes. However, cross‐scale surveys of genetic diversity are often impractical within single studies, and combining datasets to increase spatiotemporal coverage is frequently impeded by using different sets of molecular markers. Recently developed molecular tools make surveys based on standardized single‐nucleotide polymorphism (SNP) panels more feasible than ever, but require existing genomic information. Here, we conduct the first survey of genome‐wide SNPs across the native range of brook trout (Salvelinus fontinalis), a cold‐adapted species that has been the focus of considerable conservation and management effort across eastern North America. Our dataset can be leveraged to easily design SNP panels that allow datasets to be combined for large‐scale analyses. We performed restriction site‐associated DNA sequencing for wild brook trout from 82 locations spanning much of the native range and domestic brook trout from 24 hatchery strains used in stocking efforts. We identified over 24,000 SNPs distributed throughout the brook trout genome. We explored the ability of these SNPs to resolve relationships across spatial scales, including population structure and hatchery admixture. Our dataset captures a wide spectrum of genetic diversity in native brook trout, offering a valuable resource for developing SNP panels. We highlight potential applications of this resource with the goal of increasing the integration of genomic information into decision‐making for brook trout and other species of conservation or management concern.more » « less
-
Abstract Persistent identifiers for research objects, researchers, organizations, and funders are the key to creating unambiguous and persistent connections across the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but metadata for existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. This paper introduces the global research infrastructure and demonstrates how repositories, and their user communities, can contribute to and benefit from connections to the global research infrastructure. The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here. Identifiers (Digital Object Identifiers, DOIs) for papers connected to datasets in Dryad have long been a critical part of the Dryad metadata creation and curation processes. Since 2019, the portion of datasets with connected papers has decreased from 100% to less than 40%. This decrease has significant ramifications for the re-curation efforts described above as connected papers have been an important source of metadata. In addition, missing connections to papers make understanding and re-using datasets more difficult. Connections between datasets and papers can be difficult to make because of time lags between submission and publication, lack of clear mechanisms for citing datasets and other research objects from papers, changing focus of researchers, and other obstacles. The Dryad community of members, i.e. users, research institutions, publishers, and funders have vested interests in identifying these connections and critical roles in the curation and re-curation efforts. Their engagement will be critical in building on the successes Dryad has already achieved and ensuring sustainable connectivity in the future.more » « less
-
Over the past five decades, a large number of wild animals have been individually identified by various observation systems and/or temporary tracking methods, providing unparalleled insights into their lives over both time and space. However, so far there is no comprehensive record of uniquely individually identified animals nor where their data and metadata are stored, for example photos, physiological and genetic samples, disease screens, information on social relationships.Databases currently do not offer unique identifiers for living, individual wild animals, similar to the permanent ID labelling for deceased museum specimens.To address this problem, we introduce two new concepts: (1) a globally unique animal ID (UAID) available to define uniquely and individually identified animals archived in any database, including metadata archived at the time of publication; and (2) the digital ‘home’ for UAIDs, the Movebank Life History Museum (MoMu), storing and linking metadata, media, communications and other files associated with animals individually identified in the wild. MoMu will ensure that metadata are available for future generations, allowing permanent linkages to information in other databases.MoMu allows researchers to collect and store photos, behavioural records, genome data and/or resightings of UAIDed animals, encompassing information not easily included in structured datasets supported by existing databases. Metadata is uploaded through the Animal Tracker app, the MoMu website, by email from registered users or through an Application Programming Interface (API) from any database. Initially, records can be stored in a temporary folder similar to a field drawer, as naturalists routinely do. Later, researchers and specialists can curate these materials for individual animals, manage the secure sharing of sensitive information and, where appropriate, publish individual life histories with DOIs. The storage of such synthesized lifetime stories of wild animals under a UAID (unique identifier or ‘animal passport’) will support basic science, conservation efforts and public participation.more » « less
-
Abstract Since allozymes were first used to assess genetic diversity in the 1960s and 1970s, biologists have attempted to characterize gene pools and conserve the diversity observed in domestic crops, livestock, zoos and (more recently) natural populations. Recently, some authors have claimed that the importance of genetic diversity in conservation biology has been greatly overstated. Here, we argue that a voluminous literature indicates otherwise. We address four main points made by detractors of genetic diversity's role in conservation by using published literature to firmly establish that genetic diversity is intimately tied to evolutionary fitness, and that the associated demographic consequences are of paramount importance to many conservation efforts. We think that responsible management in the Anthropocene should, whenever possible, include the conservation of ecosystems, communities, populations and individuals, and their underlying genetic diversity.more » « less
An official website of the United States government
