Biodiversity datasets, or descriptions of biodiversity datasets, are increasingly available through open digital data infrastructures such as Global Biodiversity Information Facility (GBIF, https://gbif.org), Integrated Digitized Biocollections (iDigBio, https://www.idigbio.org) and the Biological Collection Access Service (BioCASE, http://www.biocase.org). </p> However, little is known about how these networks, and the data accessed through them, change over time. This dataset provide snapshots of the biodiversity dataset graphs as tracked by Preston (https://github.com/bio-guoda/preston , https://doi.org/10.5281/zenodo.1410543 ).</p> The rdf/nquad and tsv snapshots were generated using the respective commands:</p> preston ls | bzip2 > preston-ls.nq.bz2</p> and</p> preston ls --log tsv | bzip2 > preston-ls.tsv.bz2</p> For convenience, the first 100 uncompressed entries of both files are included, as well as the sha256 hashes of the content of the files.</p>
more »
« less
The Walking Dead: Status Report, Data Workflow and Best Practices of the oVert Thematic Collections Network
In 2017 NSF funded “oVert (openVertebrate): Open Exploration of Vertebrate Diversity in 3D,” which is the first Thematic Collections Network devoted entirely to vertebrate morphological specimens. The primary goal of oVert is to generate and serve high-resolution digital three-dimensional data for internal anatomy across vertebrate diversity. oVert will CT-scan >20,000 fluid-preserved specimens representing >80% of the living genera of vertebrates, providing broad coverage for exploration and research on all major groups of vertebrates. Contrast-enhanced scans will be generated to reveal soft tissues and organs for a majority of the living vertebrate families. This collection of digital imagery and three-dimensional volumes will be open for exploration, download, and use. These new media will provide unprecedented global access to valuable morphological data of specimens in US collections.oVert is developing best practices and guidelines for high-throughput CT-scanning, including efficient workflows, preferred resolutions, and archival formats that optimize the variety of downstream applications. Using the Integrated Digitized Biocollections (iDigBio) API, we have developed a workflow where people uploading media files to MorphoSource can search for and import metadata for specimens directly from iDigBio. Via a Rich Site Summary (RSS) feed from MorphoSource, Audubon Core data describing media files for a given scientific collection can be retrieved and integrated into institutional IPT and databases. Such data migration of large files requires attention to detail and the development of data workflows that ensure correct specimen mapping at all steps. The RSS feed from MorphoSource will also consolidate usage information for media files from specimens in each scientific collection for reporting. Additional goals of the project are to provide information vital to the creation of collection best practices for imaging permissions/copyright. A status report and update on best practices will be presented.
more »
« less
- Award ID(s):
- 1701714
- PAR ID:
- 10066525
- Date Published:
- Journal Name:
- Biodiversity Information Science and Standards
- Volume:
- 2
- ISSN:
- 2535-0897
- Page Range / eLocation ID:
- e26078
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Collections digitization relies increasingly upon computational and data management resources that occasionally exceed the capacity of natural history collections and their managers and curators. Digitization of many tens of thousands of micropaleontological specimen slides, as evidenced by the effort presented here by the Indiana University Paleontology Collection, has been a concerted effort in adherence to the recommended practices of multifaceted aspects of collections management for both physical and digital collections resources. This presentation highlights the contributions of distributed cyberinfrastructure from the National Science Foundation-supported Extreme Science and Engineering Discovery Environment (XSEDE) for web-hosting of collections management system resources and distributed processing of millions of digital images and metadata records of specimens from our collections. The Indiana University Center for Biological Research Collections is currently hosting its instance of the Specify collections management system (CMS) on a virtual server hosted on Jetstream, the cloud service for on-demand computational resources as provisioned by XSEDE. This web-service allows the CMS to be flexibly hosted on the cloud with additional services that can be provisioned on an as-needed basis for generating and integrating digitized collections objects in both web-friendly and digital preservation contexts. On-demand computing resources can be used for the manipulation of digital images for automated file I/O, scripted renaming of files for adherence to file naming conventions, derivative generation, and backup to our local tape archive for digital disaster preparedness and long-term storage. Here, we will present our strategies for facilitating reproducible workflows for general collections digitization of the IUPC nomenclatorial types and figured specimens in addition to the gigapixel resolution photographs of our large collection of microfossils using our GIGAmacro system (e.g., this slide of conodonts). We aim to demonstrate the flexibility and nimbleness of cloud computing resources for replicating this, and other, workflows to enhance the findability, accessibility, interoperability, and reproducibility of the data and metadata contained within our collections.more » « less
-
Abstract Computed tomography (CT) scanning and other high‐throughput three‐dimensional (3D) visualization tools are transforming the ways we study morphology, ecology and evolutionary biology research beyond generating vast digital repositories of anatomical data. Contrast‐enhanced chemical staining methods, which render soft tissues radio‐opaque when coupled with CT scanning, encompass several approaches that are growing in popularity and versatility. Of these, the various diceCT techniques that use an iodine‐based solution like Lugol's have provided access to an array of morphological data sets spanning extant vertebrate lineages. This contribution outlines straightforward means for applying diceCT techniques to preserved museum specimens of cartilaginous and bony fishes, collectively representing half of vertebrate species diversity. This study contrasts the benefits of using either aqueous or ethylic Lugol's solutions and reports few differences between these methods with respect to the time required to achieve optimal tissue contrast. It also explores differences in minimum stain duration required for different body sizes and shapes and provides recommendations for staining specimens individually or in small batches. As reported by earlier studies, the authors note a decrease in pH during staining with either aqueous or ethylic Lugol's. Nonetheless, they could not replicate the drastic declines in pH reported elsewhere. They provide recommendations for researchers and collections staff on how to incorporate diceCT into existing curatorial practices, while offsetting risk to specimens. Finally, they outline how diceCT with Lugol's can aid ichthyologists of all kinds in visualizing anatomical structures of interest: from brains and gizzards to gas bladders and pharyngeal jaw muscles.more » « less
-
Goldfarb, Keith (Ed.)Natural history collections are important depositories of biodiversity data. Digital photography of natural history collection specimens and subsequent dissemination of the resulting images on the web allow for the virtual discovery of these specimens, enhancing their accessibility to the target audience and the public in general. This presentation discusses digital photography of marine mollusks in collections, including some of the latest techniques for imaging of very small specimens, photography of specimens preserved in liquid, haptobionts, problems of color retention, transparency, 3-D photography, equipment, and other current areas of interest. Despite the focus on mollusks, the discussions can be extrapolated as generalities applicable to invertebrates from other phyla. The presentation also includes a discussion on equipment and the ideal digital parameters for imaging of natural history collection specimens, including image policies on acceptable file-format requirements for data hosts and aggregators such as iDigBio and others. (The presentation includes work funded in part by the NSF Thematic Collections Network grant award 2001528 “Mobilizing Millions of Mollusks from the Eastern Seaboard”).more » « less
-
Abstract Most species exhibit morphological stasis following speciation, and this is a key feature of the concept of punctuated equilibria. Stasis results in species often having long durations on geological timescales. Durational data are fundamental to many types of paleobiological analyses and are ideally based on occurrence data represented by specimens in museum collections. Often, however, durational data are presented without supporting information about voucher specimens that document stratigraphic ranges, including first and last appearances. We use the iconic Devonian trilobiteEldredgeops ranato demonstrate that durational data can be challenging to determine at multiple taxonomic levels. Further, we show that different datasets—including Sepkoski’s published databases, the Paleobiology Database, and iDigBio—give discordant results concerning first and last occurrences. We argue that paleontologists should adopt two general best practices to help address these problems. First, systematists should clearly identify voucher specimens that represent stratigraphic occurrences of species. Second, we recommend that high-quality photographs of occurrence vouchers be placed in open access websites and be assigned public domain licensing before being paywalled by journals. Such voucher images also have a role to play in training artificial intelligence (AI) systems that will be applied to future paleobiological questions.more » « less
An official website of the United States government

