skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Finding biotic anomalies described in specimen label text is a challenge that artificial intelligence can address
Biodiversity specimen collectors are on the front lines of observing biotic anomalies, some of which herald early stages of significant changes (e.g., the arrival of a new disease; Pearson and Mast 2019). Online data sharing has opened new possibilities for the discovery of anomaly descriptions on collectors’ labels, but it remains a challenge to find these needles in the haystack of many millions of specimen records available at aggregators like iDigBio and Global Biodiversity Information Facility. In a recent community survey, over 200 collectors identified 170 unique words and phrases (e.g., atypical) that they would use to describe six types of anomaly (Pearson and Mast 2019). Left unanswered was the relative efficiency with which anomaly descriptions can be found using the simple presence of these words. Here, we address that question with a focus on one type of anomaly (phenological; related to the timing of life historyevents) and ask a second question: can we further improve the efficiency of anomaly description discovery by engaging artificial intelligence (AI)?  more » « less
Award ID(s):
2027654
PAR ID:
10518707
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
NSF Public Access Repository (NSF-PAR)
Date Published:
Journal Name:
BAUHINIA – Zeitschrift der Basler Botanischen Gesellschaft
Volume:
29
ISSN:
0067-4605
Subject(s) / Keyword(s):
Anomaly detection Artificial intelligence Biodiversity specimens Global change Phenology
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Natural history collections (NHCs) are the foundation of historical baselines for assessing anthropogenic impacts on biodiversity. Along these lines, the online mobilization of specimens via digitization—the conversion of specimen data into accessible digital content—has greatly expanded the use of NHC collections across a diversity of disciplines. We broaden the current vision of digitization (Digitization 1.0)—whereby specimens are digitized within NHCs—to include new approaches that rely on digitized products rather than the physical specimen (Digitization 2.0). Digitization 2.0 builds on the data, workflows, and infrastructure produced by Digitization 1.0 to create digital-only workflows that facilitate digitization, curation, and data links, thus returning value to physical specimens by creating new layers of annotation, empowering a global community, and developing automated approaches to advance biodiversity discovery and conservation. These efforts will transform large-scale biodiversity assessments to address fundamental questions including those pertaining to critical issues of global change. 
    more » « less
  2. Abstract Natural history collections are repositories of biodiversity specimens that provide critical infrastructure for studies of mammals. Over the past 3 decades, digitization of collections has opened up the temporal and spatial properties of specimens, stimulating new data sharing, use, and training across the biodiversity sciences. These digital records are the cornerstones of an “extended specimen network,” in which the diverse data derived from specimens become digital, linked, and openly accessible for science and policy. However, still missing from most digital occurrences of mammals are their morphological, reproductive, and life-history traits. Unlocking this information will advance mammalogy, establish richer faunal baselines in an era of rapid environmental change, and contextualize other types of specimen-derived information toward new knowledge and discovery. Here, we present the Ranges Digitization Network (Ranges), a community effort to digitize specimen-level traits from all terrestrial mammals of western North America, append them to digital records, publish them openly in community repositories, and make them interoperable with complimentary data streams. Ranges is a consortium of 23 institutions with an initial focus on non-marine mammal species (both native and introduced) occurring in western Canada, the western United States, and Mexico. The project will establish trait data standards and informatics workflows that can be extended to other regions, taxa, and traits. Reconnecting mammalogists, museum professionals, and researchers for a new era of collections digitization will catalyze advances in mammalogy and create a community-curated trait resource for training and engagement with global conservation initiatives. 
    more » « less
  3. How do children learn to connect expressions (e.g “that red apple”) to the real-world objects they refer to? The dominant view in developmental psychology is that children rely primarily on descriptive information encoded in content words (red, apple). In contrast, linguistic semantic theories of adult language attribute primacy to the grammar (e.g. words like that, another), which first establish the status of potential referents within the discourse context (old, new) before descriptive information can factor in. These theories predict that reference can succeed even when the description does not match the referent. We explore this novel prediction in adults and children. Over three experiments, we found that (i) adults relied on the articles to establish the referent, even when the noun description did not fit, consistent with grammar-first accounts; (ii) consistent with description-first accounts, and contrary to adult behavior, 3-5yo children prioritized the descriptions provided by the nouns, despite being sensitive to grammatical information. 
    more » « less
  4. In the first decades of the 21stcentury, there has been a global trend towards digitisation and the mobilisation of data from natural history museums and research institutions. The development of national and international aggregator systems, which focused on data standards, made it possible to access millions of museum specimen records. These records serve as an empirical foundation for research across various fields. In addition, community efforts have expanded the concept of natural history collection specimens to include physical preparations and digital resources, resulting in the Digital Extended Specimen (DES), which also includes derived and related data. Within this context, the paper proposes using the FAIR Digital Object (FDO) framework to accelerate the global vision of the DES, arguing that FDO-enabled infrastructures can reduce barriers to the discovery and access of specimens, help ensure credit back to contributors and increase the amount of research that incorporates biodiversity data. 
    more » « less
  5. Summary Natural history collections (NHCs) are essential for studying biodiversity. Although spatial, temporal, and taxonomic biases in NHCs affect analyses, the influence of collector practices on biases remains largely unexplored.We utilized one million digitized specimens collected in the northeastern United States byc.10 000 collectors to investigate how collector practices shape spatial, temporal, and taxonomic biases in NHCs; and similarities and differences between practices of more‐ and less‐prolific collectors.We identified six common collector practices, or collection norms: collectors generally collected different species, from multiple locations, from sites sampled by others, during the principal growing season, species identifiable outside peak collecting months, and species from species‐poor families and genera. Some norms changed over decades, with different taxa favored during different periods. Collection norms have increased taxonomic coverage in NHCs; however, collectors typically avoided large, taxonomically complex groups, causing their underrepresentation in NHCs. Less‐prolific collectors greatly enhanced coverage by collecting during more months and from less‐sampled locations.We assert that overall collection biases are shaped by shared predictable collection norms rather than random practices of individual collectors. Predictable biases offer an opportunity to more effectively address biases in future biodiversity models. 
    more » « less