Abstract Natural history collections (NHCs) are the foundation of historical baselines for assessing anthropogenic impacts on biodiversity. Along these lines, the online mobilization of specimens via digitization—the conversion of specimen data into accessible digital content—has greatly expanded the use of NHC collections across a diversity of disciplines. We broaden the current vision of digitization (Digitization 1.0)—whereby specimens are digitized within NHCs—to include new approaches that rely on digitized products rather than the physical specimen (Digitization 2.0). Digitization 2.0 builds on the data, workflows, and infrastructure produced by Digitization 1.0 to create digital-only workflows that facilitate digitization, curation, and data links, thus returning value to physical specimens by creating new layers of annotation, empowering a global community, and developing automated approaches to advance biodiversity discovery and conservation. These efforts will transform large-scale biodiversity assessments to address fundamental questions including those pertaining to critical issues of global change.
more »
« less
Finding biotic anomalies described in specimen label text is a challenge that artificial intelligence can address
Biodiversity specimen collectors are on the front lines of observing biotic anomalies, some of which herald early stages of significant changes (e.g., the arrival of a new disease; Pearson and Mast 2019). Online data sharing has opened new possibilities for the discovery of anomaly descriptions on collectors’ labels, but it remains a challenge to find these needles in the haystack of many millions of specimen records available at aggregators like iDigBio and Global Biodiversity Information Facility. In a recent community survey, over 200 collectors identified 170 unique words and phrases (e.g., atypical) that they would use to describe six types of anomaly (Pearson and Mast 2019). Left unanswered was the relative efficiency with which anomaly descriptions can be found using the simple presence of these words. Here, we address that question with a focus on one type of anomaly (phenological; related to the timing of life historyevents) and ask a second question: can we further improve the efficiency of anomaly description discovery by engaging artificial intelligence (AI)?
more »
« less
- Award ID(s):
- 2027654
- PAR ID:
- 10518707
- Publisher / Repository:
- NSF Public Access Repository (NSF-PAR)
- Date Published:
- Journal Name:
- BAUHINIA – Zeitschrift der Basler Botanischen Gesellschaft
- Volume:
- 29
- ISSN:
- 0067-4605
- Subject(s) / Keyword(s):
- Anomaly detection Artificial intelligence Biodiversity specimens Global change Phenology
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
How do children learn to connect expressions (e.g “that red apple”) to the real-world objects they refer to? The dominant view in developmental psychology is that children rely primarily on descriptive information encoded in content words (red, apple). In contrast, linguistic semantic theories of adult language attribute primacy to the grammar (e.g. words like that, another), which first establish the status of potential referents within the discourse context (old, new) before descriptive information can factor in. These theories predict that reference can succeed even when the description does not match the referent. We explore this novel prediction in adults and children. Over three experiments, we found that (i) adults relied on the articles to establish the referent, even when the noun description did not fit, consistent with grammar-first accounts; (ii) consistent with description-first accounts, and contrary to adult behavior, 3-5yo children prioritized the descriptions provided by the nouns, despite being sensitive to grammatical information.more » « less
-
In the first decades of the 21stcentury, there has been a global trend towards digitisation and the mobilisation of data from natural history museums and research institutions. The development of national and international aggregator systems, which focused on data standards, made it possible to access millions of museum specimen records. These records serve as an empirical foundation for research across various fields. In addition, community efforts have expanded the concept of natural history collection specimens to include physical preparations and digital resources, resulting in the Digital Extended Specimen (DES), which also includes derived and related data. Within this context, the paper proposes using the FAIR Digital Object (FDO) framework to accelerate the global vision of the DES, arguing that FDO-enabled infrastructures can reduce barriers to the discovery and access of specimens, help ensure credit back to contributors and increase the amount of research that incorporates biodiversity data.more » « less
-
Over the last decade, the United States paleontological collections community has invested heavily in the digitization of specimen-based data, including over 10 million USD funded through the National Science Foundation’s Advancing Digitization of Biodiversity Collections program. Fossil specimen data—9.0 million records and counting (Global Biodiversity Information Facility 2024)—are now accessible on open science platforms such as the Global Biodiversity Information Facility (GBIF). However, the full potential of this data is far from realized due to fundamental challenges associated with mobilization, discoverability, and interoperability of paleontological information within the existing cyberinfrastructure landscape and data pipelines. Additionally, it can be difficult for individuals with varying expertise to develop a comprehensive understanding of the existing landscape due to its breadth and complexity. Here, we present preliminary results from a project aiming to explore how we might address these problems. Funding from the US National Science Foundation (NSF) to the University of Colorado Museum of Natural History, Smithsonian National Museum of Natural History, and Arizona State University will result in, among other products, an “ecosystem map” for the paleontological collections community. This map will be an information-rich visualization of entities (e.g. concepts, systems, platforms, mechanisms, drivers, tools, documentation, data, standards, people, organizations) operating in, intersecting with, or existing in parallel to our domain. We are inspired and informed by similar efforts to map the biodiversity informatics landscape (Bingham et al. 2017) and the research infrastructure landscape (Distributed System of Scientific Collections 2024), as well as by many ongoing metadata cataloging projects, e.g. re3data and the Global Registry of Scientific Collections (GRSciColl). Our strategy for developing this ecosystem map is to model the existing information and systems landscape by characterizing entities, e.g. potentially in a graph database as nodes with relationships to other nodes. The ecosystem map will enable us to provide guidance for communities workingacrossdifferent sectors of the landscape, promoting a shared understanding of the ecosystem that everyone works in together. We can also use the map to identify points of entry and engagement at various stages of the paleontological data process, and to engage diverse memberswithinthe paleontological community. We see three primary user types for this map: people new(er) to the community, people with expertise in a subset of the community, and people working to integrate initiatives and systems across communities. Each of these user types needs tailored access to the ecosystem map and its community knowledge. By promoting shared knowledge with the map, users will be able to identify their own space within the ecosystem and the connections or partnerships that they can utilize to expand their knowledge or resources, relieving the burden on any single individual to hold a comprehensive understanding. For example, the flow of taxonomic information between publications, collections, digital resources, and biodiversity aggregators is not straightforward or easy to understand. A person with expertise in collections care may want to use the ecosystem map to understand why taxonomic identifications associated with their specimen occurrence records are showing up incorrectly when published to GBIF. We envision that our final ecosystem map will visualize the flow of taxonomic information and how it is used to interpret specimen occurrence data, thereby highlighting to this user where problems may be happening and whom to ask for help in addressing them (Fig. 1). Ultimately, development of this map will allow us to identify mobilization pathways for paleontological data, highlight core cyberinfrastructure resources, define cyberinfrastructure gaps, strategize future partnerships, promote shared knowledge, and engage a broader array of expertise in the process. Contributing domain-based evidence FAIRly*2 requires expertise that bridges the content (e.g. paleontology) and the mechanics (e.g. informatics). By centering the role of humans in open science cyberinfrastructure throughout our process, we hope to develop systems that create and sustain such expertise.more » « less
-
Summary Natural history collections (NHCs) are essential for studying biodiversity. Although spatial, temporal, and taxonomic biases in NHCs affect analyses, the influence of collector practices on biases remains largely unexplored.We utilized one million digitized specimens collected in the northeastern United States byc.10 000 collectors to investigate how collector practices shape spatial, temporal, and taxonomic biases in NHCs; and similarities and differences between practices of more‐ and less‐prolific collectors.We identified six common collector practices, or collection norms: collectors generally collected different species, from multiple locations, from sites sampled by others, during the principal growing season, species identifiable outside peak collecting months, and species from species‐poor families and genera. Some norms changed over decades, with different taxa favored during different periods. Collection norms have increased taxonomic coverage in NHCs; however, collectors typically avoided large, taxonomically complex groups, causing their underrepresentation in NHCs. Less‐prolific collectors greatly enhanced coverage by collecting during more months and from less‐sampled locations.We assert that overall collection biases are shaped by shared predictable collection norms rather than random practices of individual collectors. Predictable biases offer an opportunity to more effectively address biases in future biodiversity models.more » « less
An official website of the United States government

