International collaboration between collections, aggregators, and researchers within the biodiversity community and beyond is becoming increasingly important in our efforts to support biodiversity, conservation and the life of the planet. The social, technical, logistical and financial aspects of an equitable biodiversity data landscape – from workforce training and mobilization of linked specimen data, to data integration, use and publication – must be considered globally and within the context of a growing biodiversity crisis. In recent years, several initiatives have outlined paths forward that describe how digital versions of natural history specimens can be extended and linked with associated data. In the United States, Webster (2017) presented the “extended specimen”, which was expanded upon by Lendemer et al. (2019) through the work of the Biodiversity Collections Network (BCoN). At the same time, a “digital specimen” concept was developed by DiSSCo in Europe (Hardisty 2020). Both the extended and digital specimen concepts depict a digital proxy of an analog natural history specimen, whose digital nature provides greater capabilities such as being machine-processable, linkages with associated data, globally accessible information-rich biodiversity data, improved tracking, attribution and annotation, additional opportunities for data use and cross-disciplinary collaborations forming the basis for FAIR (Findable, Accessible, Interoperable, Reproducible) and equitable sharing of benefits worldwide, and innumerable other advantages, with slight variation in how an extended or digital specimen model would be executed. Recognizing the need to align the two closely-related concepts, and to provide a place for open discussion around various topics of the Digital Extended Specimen (DES; the current working name for the joined concepts), we initiated a virtual consultation on the discourse platform hosted by the Alliance for Biodiversity Knowledge through GBIF. This platform provided a forum for threaded discussions around topics related and relevant to the DES. The goals of the consultation align with the goals of the Alliance for Biodiversity Knowledge: expand participation in the process, build support for further collaboration, identify use cases, identify significant challenges and obstacles, and develop a comprehensive roadmap towards achieving the vision for a global specification for data integration. In early 2021, Phase 1 launched with five topics: Making FAIR data for specimens accessible; Extending, enriching and integrating data; Annotating specimens and other data; Data attribution; and Analyzing/mining specimen data for novel applications. This round of full discussion was productive and engaged dozens of contributors, with hundreds of posts and thousands of views. During Phase 1, several deeper, more technical, or additional topics of relevance were identified and formed the foundation for Phase 2 which began in May 2021 with the following topics: Robust access points and data infrastructure alignment; Persistent identifier (PID) scheme(s); Meeting legal/regulatory, ethical and sensitive data obligations; Workforce capacity development and inclusivity; Transactional mechanisms and provenance; and Partnerships to collaborate more effectively. In Phase 2 fruitful progress was made towards solutions to some of these complex functional and technical long-term goals. Simultaneously, our commitment to open participation was reinforced, through increased efforts to involve new voices from allied and complementary fields. Among a wealth of ideas expressed, the community highlighted the need for unambiguous persistent identifiers and a dedicated agent to assign them, support for a fully linked system that includes robust publishing mechanisms, strong support for social structures that build trustworthiness of the system, appropriate attribution of legacy and new work, a system that is inclusive, removed from colonial practices, and supportive of creative use of biodiversity data, building a truly global data infrastructure, balancing open access with legal obligations and ethical responsibilities, and the partnerships necessary for success. These two consultation periods, and the myriad activities surrounding the online discussion, produced a wide variety of perspectives, strategies, and approaches to converging the digital and extended specimen concepts, and progressing plans for the DES -- steps necessary to improve access to research-ready data to advance our understanding of the diversity and distribution of life. Discussions continue and we hope to include your contributions to the DES in future implementation plans.
more »
« less
Assessing the FAIR Digital Object Framework for Global Biodiversity Research
In the first decades of the 21stcentury, there has been a global trend towards digitisation and the mobilisation of data from natural history museums and research institutions. The development of national and international aggregator systems, which focused on data standards, made it possible to access millions of museum specimen records. These records serve as an empirical foundation for research across various fields. In addition, community efforts have expanded the concept of natural history collection specimens to include physical preparations and digital resources, resulting in the Digital Extended Specimen (DES), which also includes derived and related data. Within this context, the paper proposes using the FAIR Digital Object (FDO) framework to accelerate the global vision of the DES, arguing that FDO-enabled infrastructures can reduce barriers to the discovery and access of specimens, help ensure credit back to contributors and increase the amount of research that incorporates biodiversity data.
more »
« less
- Award ID(s):
- 2027654
- PAR ID:
- 10518721
- Publisher / Repository:
- NSF Public Access Repository (NSF-PAR)
- Date Published:
- Journal Name:
- Research Ideas and Outcomes
- Volume:
- 9
- ISSN:
- 2367-7163
- Subject(s) / Keyword(s):
- FAIR data FDO natural history museums natural science collections biodiversity specimens interoperability persistent identifiers FAIR implementation extended specimen network
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Over 300 million arthropod specimens are housed in North American natural history collections. These collections represent a “vast hidden treasure trove” of biodiversity −95% of the specimen label data have yet to be transcribed for research, and less than 2% of the specimens have been imaged. Specimen labels contain crucial information to determine species distributions over time and are essential for understanding patterns of ecology and evolution, which will help assess the growing biodiversity crisis driven by global change impacts. Specimen images offer indispensable insight and data for analyses of traits, and ecological and phylogenetic patterns of biodiversity. Here, we review North American arthropod collections using two key metrics, specimen holdings and digitization efforts, to assess the potential for collections to provide needed biodiversity data. We include data from 223 arthropod collections in North America, with an emphasis on the United States. Our specific findings are as follows: (1) The majority of North American natural history collections (88%) and specimens (89%) are located in the United States. Canada has comparable holdings to the United States relative to its estimated biodiversity. Mexico has made the furthest progress in terms of digitization, but its specimen holdings should be increased to reflect the estimated higher Mexican arthropod diversity. The proportion of North American collections that has been digitized, and the number of digital records available per species, are both much lower for arthropods when compared to chordates and plants. (2) The National Science Foundation’s decade-long ADBC program (Advancing Digitization of Biological Collections) has been transformational in promoting arthropod digitization. However, even if this program became permanent, at current rates, by the year 2050 only 38% of the existing arthropod specimens would be digitized, and less than 1% would have associated digital images. (3) The number of specimens in collections has increased by approximately 1% per year over the past 30 years. We propose that this rate of increase is insufficient to provide enough data to address biodiversity research needs, and that arthropod collections should aim to triple their rate of new specimen acquisition. (4) The collections we surveyed in the United States vary broadly in a number of indicators. Collectively, there is depth and breadth, with smaller collections providing regional depth and larger collections providing greater global coverage. (5) Increased coordination across museums is needed for digitization efforts to target taxa for research and conservation goals and address long-term data needs. Two key recommendations emerge: collections should significantly increase both their specimen holdings and their digitization efforts to empower continental and global biodiversity data pipelines, and stimulate downstream research.more » « less
-
Thanks to substantial support for biodiversity data mobilization in recent decades, billions of occurrence records are openly available, documenting life on Earth and enabling timely research, awareness raising, and policy-making. Initiatives across local to global scales have been separately funded to serve different, yet often overlapping audiences of data users, and have developed a variety of platforms and infrastructures to meet the needs of these audiences. The independent progress of biodiversity data providers has led to innovations as well as challenges for the community at large as we move towards connecting and linking a diversity of information from disparate sources as Digital Extended Specimens (DES). Recognizing a need for deeper and more frequent opportunities for communication and collaboration across the globe, an ad-hoc group of representatives of various international, national, and regional organizations have been meeting virtually since 2020 to provide a forum for updates, announcements, and shared progress. This group is provisionally named International Partners for the Digital Extended Specimen (IPDES), and is guided by these four concepts: Biodiversity, Connection, Knowledge and Agency. Participants in IPDES include representatives of the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), American Institute of Biological Sciences (AIBS), Biodiversity Collections Network (BCoN), Natural Science Collections Alliance (NSCA), Distributed System of Scientific Collections (DiSSCo), Atlas of Living Australia (ALA), Biodiversity Information Standards (TDWG), Society for the Preservation of Natural History Collections (SPNHC), National Specimen Information Infrastructure of China (NSII), and South African National Biodiversity Institute (SANBI), as well as individuals involved with biodiversity informatics initiatives, natural science collections, museums, herbaria, and universities. Our global partners group strives to increase representation from around the globe as we aim to enable research that contributes to novel discoveries and addresses the societal challenges leading to the biodiversity crisis. Our overarching mission is to expand on the community-driven successes to connect biodiversity data and knowledge through coordination of a globally integrated network of stakeholders to enable an extensible technical and social infrastructure of data, tools, and working practices in support of our vision. The main work of our group thus far includes publishing a paper on the Digital Extended Specimen (Hardisty et al. 2022), organizing and hosting an array of activities at conferences, and asynchronous online work and forum-based exchanges. We aim to advance discussion on topics of broad interest to our community such as social and technical capacity building, broadening participation, expanding social and data networks, improving data models and building a backbone for the DES, and identifying international funding solutions. This presentation will highlight some of these activities and detail progress towards a roadmap for the development of the human network and technical infrastructure necessary to support the DES. It provides an opportunity for feedback from and engagement by stakeholder communities such as TDWG and other initiatives with a focus on data standards and biodiversity informatics, as we solidify our plans for the future in support of integrated and interconnected biodiversity data and credit for those doing the work.more » « less
-
Abstract PremiseOne of the slowest steps in digitizing natural history collections is converting labels associated with specimens into a digital data record usable for collections management and research. Here, we address how herbarium specimen labels can be converted into digital data records via extraction into standardized Darwin Core fields. MethodsWe first showcase the development of a rule‐based approach and compare outcomes with a large language model–based approach, in particular ChatGPT4. We next quantified omission and commission error rates across target fields for a set of labels transcribed using optical character recognition (OCR) for both approaches. For example, we find that ChatGPT4 often creates field names that are not Darwin Core compliant while rule‐based approaches often have high commission error rates. ResultsOur results suggest that these approaches each have different strengths and limitations. We therefore developed an ensemble approach that leverages the strengths of each individual method and documented that ensembling strongly reduced overall information extraction errors. DiscussionThis work shows that an ensemble approach has particular value for creating high‐quality digital data records, even for complicated label content. While human validation is still needed to ensure the best possible quality, automated approaches can speed digitization of herbarium specimen labels and are likely to be broadly usable for all natural history collection types.more » « less
-
As we look to the future of natural history collections and a global integration of biodiversity data, we are reliant on a diverse workforce with the skills necessary to build, grow, and support the data, tools, and resources of the Digital Extended Specimen (DES; Webster 2019, Lendemer et al. 2020, Hardisty 2020). Future “DES Data Curators” – those who will be charged with maintaining resources created through the DES – will require skills and resources beyond what is currently available to most natural history collections staff. In training the workforce to support the DES we have an opportunity to broaden our community and ensure that, through the expansion of biodiversity data, the workforce landscape itself is diverse, equitable, inclusive, and accessible. A fully-implemented DES will provide training that encapsulates capacity building, skills development, unifying protocols and best practices guidance, and cutting-edge technology that also creates inclusive, equitable, and accessible systems, workflows, and communities. As members of the biodiversity community and the current workforce, we can leverage our knowledge and skills to develop innovative training models that: include a range of educational settings and modalities; address the needs of new communities not currently engaged with digital data; from their onset, provide attribution for past and future work and do not perpetuate the legacy of colonial practices and historic inequalities found in many physical natural history collections. Recent reports from the Biodiversity Collections Network (BCoN 2019) and the National Academies of Science, Engineering and Medicine (National Academies of Sciences, Engineering, and Medicine 2020) specifically address workforce needs in support of the DES. To address workforce training and inclusivity within the context of global data integration, the Alliance for Biodiversity Knowledge included a topic on Workforce capacity development and inclusivity in Phase 2 of the consultation on Converging Digital Specimens and Extended Specimens - Towards a global specification for data integration. Across these efforts, several common themes have emerged relative to workforce training and the DES. A call for a community needs assessment: As a community, we have several unknowns related to the current collections workforce and training needs. We would benefit from a baseline assessment of collections professionals to define current job responsibilities, demographics, education and training, incentives, compensation, and benefits. This includes an evaluation of current employment prospects and opportunities. Defined skills and training for the 21st century collections professional: We need to be proactive and define the 21st century workforce skills necessary to support the development and implementation of the DES. When we define the skills and content needs we can create appropriate training opportunities that include scalable materials for capacity building, educational materials that develop relevant skills, unifying protocols across the DES network, and best practices guidance for professionals. Training for data end-users: We need to train data end-users in biodiversity and data science at all levels of formal and informal education from primary and secondary education through the existing workforce. This includes developing training and educational materials, creating data portals, and building analyses that are inclusive, accessible, and engage the appropriate community of science educators, data scientists, and biodiversity researchers. Foster a diverse, equitable, inclusive, and accessible and professional workforce: As the DES develops and new tools and resources emerge, we need to be intentional in our commitment to building tools that are accessible and in assuring that access is equitable. This includes establishing best practices to ensure the community providing and accessing data is inclusive and representative of the diverse global community of potential data providers and users. Upfront, we must acknowledge and address issues of historic inequalities and colonial practices and provide appropriate attribution for past and future work while ensuring legal and regulatory compliance. Efforts must include creating transparent linkages among data and the humans that create the data that drives the DES. In this presentation, we will highlight recommendations for building workforce capacity within the DES that are diverse, inclusive, equitable and accessible, take into account the requirements of the biodiversity science community, and that are flexible to meet the needs of an evolving field.more » « less
An official website of the United States government

