skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Specimens, Databases, and Accession Books: Using TaxonWorks to Integrate Multiple Sources of Modern and Historical Data in the INHS Insect Collection
Grant-supported digitization projects over the past 20 years at the Illinois Natural History Survey (INHS) have yielded over 1,000,000 occurrence records (representing over 2.7 million specimens), one of the most successful digitization efforts within the United States. However, receiving multiple grants at the cutting edge has led to numerous projects left at various stages of completeness, several relational databases, orphaned data, and specimens at various stages of curation. TaxonWorks (taxonworks.org), an integrated web-based workbench developed by the Species File Group and supported by the INHS and the National Science Foundation, has provided the digital infrastructure to unify multiple workflows, projects, databases, and even historical accession books into one easy to access, open-source platform. We demonstrate the practical utility of this platform and summarize past, present, and future efforts at the INHS towards integrating all our data within TaxonWorks.  more » « less
Award ID(s):
1639601
PAR ID:
10079958
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Biodiversity Information Science and Standards
Volume:
2
ISSN:
2535-0897
Page Range / eLocation ID:
e25896
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. TaxonWorks is an open-source workbench for biodiversity researchers. With several years of development behind it, we highlight its present status, and discuss if and when it makes sense to release a version 1.0, i.e. software completed to specific stage. TaxonWorks' scope is broad; it seeks to touch nearly all areas that might be of interest to taxonomists, i.e. those who integrate everything that is known about a taxon into a single resource. Its role as a software platform is placed in a broader context, where many instances of TaxonWorks each can support multiple research projects. Instances may be supported by individuals or organizations. A suite of technical tools including containerization and unit tests facilitate collaboration at many different levels. TaxonWorks is a research tool, mechanisms for analyzing the results of data curation including its application programing interface are described. The long-term development of TaxonWorks is supported by an endowment to the Species File Group. Its source is available on Github. 
    more » « less
  2. TaxonWorks (http://taxonworks.org) in an integrated, open-source, cybertaxonomic web application serving taxonomists and biodiversity scientists. It is designed to facilitate efficient data capture, storage, manipulation, and retrieval. It integrates a wide variety of data types used by biodiversity scientists, including, but not limited to, taxonomy (with validation based on codes of zoological, botanical, bacterial, and viral nomenclature), specimen data, bibliographies, media (images, PDFs, sounds, videos), morphology (character/trait matrices), distribution, biological associations. Available TaxonWorks web interfaces currently provide various data entry forms for simple and advanced querying of the database. TaxonWorks has integrated batch uploader functionality. But, for larger datasets, specialized migration scripts were used. Several projects, historically build in 3i (http://dmitriev.speciesfile.org), MX (http://mx.phenomix.org), SpeciesFiles (http://software.speciesfile.org), and other databases, have been or are being migrated into TaxonWorks. Of the projects moving into TaxonWorks, it is worth mentioning several: 3i World Auchenorrhyncha Database, LepIndex, Universal Chalcidoidea Database, Orthoptera SpeciesFile, Plecoptera SpeciesFile, Illinois Natural History Survey Insect Collection database, and several others. An experience of the data migration will be shared during the presentation. 
    more » « less
  3. Over 300 million arthropod specimens are housed in North American natural history collections. These collections represent a “vast hidden treasure trove” of biodiversity −95% of the specimen label data have yet to be transcribed for research, and less than 2% of the specimens have been imaged. Specimen labels contain crucial information to determine species distributions over time and are essential for understanding patterns of ecology and evolution, which will help assess the growing biodiversity crisis driven by global change impacts. Specimen images offer indispensable insight and data for analyses of traits, and ecological and phylogenetic patterns of biodiversity. Here, we review North American arthropod collections using two key metrics, specimen holdings and digitization efforts, to assess the potential for collections to provide needed biodiversity data. We include data from 223 arthropod collections in North America, with an emphasis on the United States. Our specific findings are as follows: (1) The majority of North American natural history collections (88%) and specimens (89%) are located in the United States. Canada has comparable holdings to the United States relative to its estimated biodiversity. Mexico has made the furthest progress in terms of digitization, but its specimen holdings should be increased to reflect the estimated higher Mexican arthropod diversity. The proportion of North American collections that has been digitized, and the number of digital records available per species, are both much lower for arthropods when compared to chordates and plants. (2) The National Science Foundation’s decade-long ADBC program (Advancing Digitization of Biological Collections) has been transformational in promoting arthropod digitization. However, even if this program became permanent, at current rates, by the year 2050 only 38% of the existing arthropod specimens would be digitized, and less than 1% would have associated digital images. (3) The number of specimens in collections has increased by approximately 1% per year over the past 30 years. We propose that this rate of increase is insufficient to provide enough data to address biodiversity research needs, and that arthropod collections should aim to triple their rate of new specimen acquisition. (4) The collections we surveyed in the United States vary broadly in a number of indicators. Collectively, there is depth and breadth, with smaller collections providing regional depth and larger collections providing greater global coverage. (5) Increased coordination across museums is needed for digitization efforts to target taxa for research and conservation goals and address long-term data needs. Two key recommendations emerge: collections should significantly increase both their specimen holdings and their digitization efforts to empower continental and global biodiversity data pipelines, and stimulate downstream research. 
    more » « less
  4. Abstract Natural history collections (NHCs) are the foundation of historical baselines for assessing anthropogenic impacts on biodiversity. Along these lines, the online mobilization of specimens via digitization—the conversion of specimen data into accessible digital content—has greatly expanded the use of NHC collections across a diversity of disciplines. We broaden the current vision of digitization (Digitization 1.0)—whereby specimens are digitized within NHCs—to include new approaches that rely on digitized products rather than the physical specimen (Digitization 2.0). Digitization 2.0 builds on the data, workflows, and infrastructure produced by Digitization 1.0 to create digital-only workflows that facilitate digitization, curation, and data links, thus returning value to physical specimens by creating new layers of annotation, empowering a global community, and developing automated approaches to advance biodiversity discovery and conservation. These efforts will transform large-scale biodiversity assessments to address fundamental questions including those pertaining to critical issues of global change. 
    more » « less
  5. Compilation and retrieval of reliable data on biological interactions is one of the critical bottlenecks affecting efficiency and statistical power in testing ecological theories. TaxonWorks, a web-based workbench, can facilitate such research by enabling the digitization of complex biological interactions involving multiple species, individuals, and trophic levels. These data can be further organized into spatial and temporal axes, and annotated at the level of individual or grouped interactions (e.g. singularly citing the combined elements of a tritrophic interaction). The simple, customizable nature of tools ultimately reduces the time-consuming steps of data gathering, cleaning, and formatting of datasets for subsequent exploration and analysis while also improving the asserted semantics. An example use case is provided with a dataset of associations among plants, pathogens and insect vectors. The curated data are accessed through the JSON serving TaxonWorks API (Application Programming Interface) by an R package. Analysis and visualization of the network graphs persisted in TaxonWorks is demonstrated using core R functionality and the igraph package (Csardi and Nepusz 2006). TaxonWorks is open-source, collaboratively built software available at http://taxonworks.org. 
    more » « less