Abstract Museum collections house millions of objects and associated data records that document biological and cultural diversity. In recent decades, digitization efforts have greatly increased accessibility to these data, thereby revolutionizing interdisciplinary studies in evolutionary biology, biogeography, epidemiology, cultural change, and human-mediated environmental impacts. Curators and collection managers can make museum data as accessible as possible to scientists and learners by using a collection management system. However, selecting a system can be a challenging task. Here, we describe Arctos, a community solution for managing and accessing collections data for research and education. Specific goals are to: (1) Describe the core elements of Arctos for a broad audience with respect to the biodiversity informatics principles that enable high quality research; (2) Highlight the unique aspects of Arctos; (3) Illustrate Arctos as a model for supporting and enhancing the Digital Extended Specimen; and (4) Emphasize the role of the Arctos community for improving data discovery and enabling cross-disciplinary, integrative studies within a sustainable governance model. In addition to detailing Arctos as both a community of museum professionals and a collection database platform, we discuss how Arctos achieves its richly annotated data by creating a web of knowledge with deep connections between catalog records and derived or associated data. We also highlight the value of Arctos as an educational resource. Finally, we present a financial model of fiscal sponsorship by a non-profit organization, implemented in 2022, to ensure the long-term success and sustainability of Arctos. We attribute Arctos’ longevity of nearly three decades to its core development principles of standardization, flexibility, interdisciplinarity, and connectivity within a nimble development model for addressing novel needs and information types in response to changing technology, workflows, ethical considerations, and regulations.
more »
« less
Arctos: Community-driven innovations for managing natural and cultural history collections
More than tools for managing physical and digital objects, museum collection management systems (CMS) serve as platforms for structuring, integrating, and making accessible the rich data embodied by natural history collections. Here we describe Arctos, a scalable community solution for managing and publishing global biological, geological, and cultural collections data for research and education. Specific goals are to: (1) Describe the core features and implementation of Arctos for a broad audience with respect to the biodiversity informatics principles that enable high quality research; (2) Highlight the unique aspects of Arctos; (3) Illustrate Arctos as a model for supporting and enhancing the Digital Extended Specimen concept; and (4) Emphasize the role of the Arctos community for improving data discovery and enabling cross-disciplinary, integrative studies within a sustainable governance model. In addition to detailing Arctos as both a community of museum professionals and a collection database platform, we discuss how Arctos achieves its richly annotated data by creating a web of knowledge with deep connections between catalog records and derived or associated data. We also highlight the value of Arctos as an educational resource. Finally, we present the financial model of fiscal sponsorship by a nonprofit organization, implemented in 2022, to ensure the long-term success and sustainability of Arctos.
more »
« less
- PAR ID:
- 10527463
- Editor(s):
- Meloro, Carlo
- Publisher / Repository:
- PLOS
- Date Published:
- Journal Name:
- PLOS ONE
- Volume:
- 19
- Issue:
- 5
- ISSN:
- 1932-6203
- Page Range / eLocation ID:
- e0296478
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The impact of preserved museum specimens is transforming and increasing by three-dimensional (3D) imaging that creates high-fidelity online digital specimens. Through examples from the openVertebrate (oVert) Thematic Collections Network, we describe how we created a digitization community dedicated to the shared vision of making 3D data of specimens available and the impact of these data on a broad audience of scientists, students, teachers, artists, and more. High-fidelity digital 3D models allow people from multiple communities to simultaneously access and use scientific specimens. Based on our multiyear, multi-institution project, we identify significant technological and social hurdles that remain for fully realizing the potential impact of digital 3D specimens.more » « less
-
Research on plant-pollinator interactions requires a diversity of perspectives and approaches, and documenting changing pollinator-plant interactions due to declining insect diversity and climate change is especially challenging. Natural history collections are increasingly important for such research and can provide ecological information across broad spatial and temporal scales. Here, we describe novel approaches that integrate museum specimens from insect and plant collections with field observations to quantify pollen networks over large spatial and temporal gradients. We present methodological strategies for evaluating insect-pollen network parameters based on pollen collected from museum insect specimens. These methods provide insight into spatial and temporal variation in pollen-insect interactions and complement other approaches to studying pollination, such as pollinator observation networks and flower enclosure experiments. We present example data from butterfly pollen networks over the past century in the Great Basin Desert and Sierra Nevada Mountains, United States. Complementary to these approaches, we describe rapid pollen identification methods that can increase speed and accuracy of taxonomic determinations, using pollen grains collected from herbarium specimens. As an example, we describe a convolutional neural network (CNN) to automate identification of pollen. We extracted images of pollen grains from 21 common species from herbarium specimens at the University of Nevada Reno (RENO). The CNN model achieved exceptional accuracy of identification, with a correct classification rate of 98.8%. These and similar approaches can transform the way we estimate pollination network parameters and greatly change inferences from existing networks, which have exploded over the past few decades. These techniques also allow us to address critical ecological questions related to mutualistic networks, community ecology, and conservation biology. Museum collections remain a bountiful source of data for biodiversity science and understanding global change.more » « less
-
Over the last decade, the United States paleontological collections community has invested heavily in the digitization of specimen-based data, including over 10 million USD funded through the National Science Foundation’s Advancing Digitization of Biodiversity Collections program. Fossil specimen data—9.0 million records and counting (Global Biodiversity Information Facility 2024)—are now accessible on open science platforms such as the Global Biodiversity Information Facility (GBIF). However, the full potential of this data is far from realized due to fundamental challenges associated with mobilization, discoverability, and interoperability of paleontological information within the existing cyberinfrastructure landscape and data pipelines. Additionally, it can be difficult for individuals with varying expertise to develop a comprehensive understanding of the existing landscape due to its breadth and complexity. Here, we present preliminary results from a project aiming to explore how we might address these problems. Funding from the US National Science Foundation (NSF) to the University of Colorado Museum of Natural History, Smithsonian National Museum of Natural History, and Arizona State University will result in, among other products, an “ecosystem map” for the paleontological collections community. This map will be an information-rich visualization of entities (e.g. concepts, systems, platforms, mechanisms, drivers, tools, documentation, data, standards, people, organizations) operating in, intersecting with, or existing in parallel to our domain. We are inspired and informed by similar efforts to map the biodiversity informatics landscape (Bingham et al. 2017) and the research infrastructure landscape (Distributed System of Scientific Collections 2024), as well as by many ongoing metadata cataloging projects, e.g. re3data and the Global Registry of Scientific Collections (GRSciColl). Our strategy for developing this ecosystem map is to model the existing information and systems landscape by characterizing entities, e.g. potentially in a graph database as nodes with relationships to other nodes. The ecosystem map will enable us to provide guidance for communities workingacrossdifferent sectors of the landscape, promoting a shared understanding of the ecosystem that everyone works in together. We can also use the map to identify points of entry and engagement at various stages of the paleontological data process, and to engage diverse memberswithinthe paleontological community. We see three primary user types for this map: people new(er) to the community, people with expertise in a subset of the community, and people working to integrate initiatives and systems across communities. Each of these user types needs tailored access to the ecosystem map and its community knowledge. By promoting shared knowledge with the map, users will be able to identify their own space within the ecosystem and the connections or partnerships that they can utilize to expand their knowledge or resources, relieving the burden on any single individual to hold a comprehensive understanding. For example, the flow of taxonomic information between publications, collections, digital resources, and biodiversity aggregators is not straightforward or easy to understand. A person with expertise in collections care may want to use the ecosystem map to understand why taxonomic identifications associated with their specimen occurrence records are showing up incorrectly when published to GBIF. We envision that our final ecosystem map will visualize the flow of taxonomic information and how it is used to interpret specimen occurrence data, thereby highlighting to this user where problems may be happening and whom to ask for help in addressing them (Fig. 1). Ultimately, development of this map will allow us to identify mobilization pathways for paleontological data, highlight core cyberinfrastructure resources, define cyberinfrastructure gaps, strategize future partnerships, promote shared knowledge, and engage a broader array of expertise in the process. Contributing domain-based evidence FAIRly*2 requires expertise that bridges the content (e.g. paleontology) and the mechanics (e.g. informatics). By centering the role of humans in open science cyberinfrastructure throughout our process, we hope to develop systems that create and sustain such expertise.more » « less
-
null (Ed.)A wealth of information about how parasites interact with their hosts already exists in collections, scientific publications, specialized databases, and grey literature. The US National Science Foundation-funded Terrestrial Parasite Tracker Thematic Collection Network (TPT) project began in 2019 to help build a comprehensive picture of arthropod ectoparasites including the evolution of these parasite-host biotic associations, distributions, and the ecological interactions of disease vectors. TPT is a network of biodiversity collections whose data can assist scientists, educators, land managers, and policymakers to better understand the complex relationship between hosts and parasites including emergent properties that may explain the causes and frequency of human and wildlife pathogens. TPT member collections make their association information easier to access via Global Biotic Interactions (GloBI, Poelen et al. 2014), which is periodically archived through Zenodo to track progress in the TPT project. TPT leverages GloBI's ability to index biotic associations from specimen occurrence records that come from existing management systems (e.g., Arctos, Symbiota, EMu, Excel, MS Access) to avoid having to completely rework existing, or build new, cyber-infrastructures before collections can share data. TPT-affiliated collection managers use collection-specific translation tables to connect their verbatim (or original) terms used to describe associations (e.g., "ex", "found on", "host") to their interpreted, machine-readable terms in the OBO Relations Ontology (RO). These interpreted terms enable searches across previously siloed association record sets, while the original verbatim values remain accessible to help retain provenance and allow for interpretation improvements. TPT is an ambitious project, with the goal to database label data from over 1.2 million specimens of arthropod parasites of vertebrates coming from 22 collections across North America. In the first year of the project, the TPT collections created over 73,700 new records and 41,984 images. In addition, 17 TPT data providers and three other collaborators shared datasets that are now indexed by GloBI, visible on the TPT GloBI project page. These datasets came from collection specimen occurrence records and literature sources. Two TPT data archives that capture and preserve the changes in the data coming from TPT to GloBI were published through Zenodo (Poelen et al. 2020a, Poelen et al. 2020b). The archives document the changes in how data are shared by collections including the biotic association data format and quantity of data captured. The Poelen et al. 2020b report included all TPT collections and biotic interactions from Arctos collections in VertNet and the Symbiota Collection of Arthropods Network (SCAN). The total number of interactions included in this report was 376,671 records (500,000 interactions is the overall goal for TPT). In addition, close coordination with TPT collection data managers including many one-on-one conversations, a workshop, and a webinar (Sullivan et al. 2020) was conducted to help guide the data capture of biotic associations. GloBI is an effective tool to help integrate biotic association data coming from occurrence records into an openly accessible, global, linked view of existing species interaction records. The results gleaned from the TPT workshop and Zenodo data archives demonstrate that minimizing changes to existing workflows allow for custom interpretation of collection-specific interaction terms. In addition, including collection data managers in the development of the interaction term vocabularies is an important part of the process that may improve data sharing and the overall downstream data quality.more » « less