skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A born‐digital field‐to‐database solution for collections‐based research using collNotes and collBook
PremiseThe digitization of natural history collections includes transcribing specimen label data into standardized formats. Born‐digital specimen data initially gathered in digital formats do not need to be transcribed, enabling their efficient integration into digitized collections. Modernizing field collection methods for born‐digital workflows requires the development of new tools and processes. Methods and ResultscollNotes, a mobile application, was developed for Android andiOSto supplement traditional field journals. Designed for efficiency in the field, collNotes avoids redundant data entries and does not require cellular service. collBook, a companion desktop application, refines field notes into database‐ready formats and produces specimen labels. ConclusionscollNotes and collBook can be used in combination as a field‐to‐database solution for gathering born‐digital voucher specimen data for plants and fungi. Both programs are open source and use common file types simplifying either program's integration into existing workflows.  more » « less
Award ID(s):
1761839 1756382
PAR ID:
10460235
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Applications in Plant Sciences
Volume:
7
Issue:
8
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract PremiseOne of the slowest steps in digitizing natural history collections is converting labels associated with specimens into a digital data record usable for collections management and research. Here, we address how herbarium specimen labels can be converted into digital data records via extraction into standardized Darwin Core fields. MethodsWe first showcase the development of a rule‐based approach and compare outcomes with a large language model–based approach, in particular ChatGPT4. We next quantified omission and commission error rates across target fields for a set of labels transcribed using optical character recognition (OCR) for both approaches. For example, we find that ChatGPT4 often creates field names that are not Darwin Core compliant while rule‐based approaches often have high commission error rates. ResultsOur results suggest that these approaches each have different strengths and limitations. We therefore developed an ensemble approach that leverages the strengths of each individual method and documented that ensembling strongly reduced overall information extraction errors. DiscussionThis work shows that an ensemble approach has particular value for creating high‐quality digital data records, even for complicated label content. While human validation is still needed to ensure the best possible quality, automated approaches can speed digitization of herbarium specimen labels and are likely to be broadly usable for all natural history collection types. 
    more » « less
  2. PremiseThe ability to sequence genome‐scale data from herbarium specimens would allow for the economical development of data sets with broad taxonomic and geographic sampling that would otherwise not be possible. Here, we evaluate the utility of a basic double‐digest restriction site–associatedDNAsequencing (ddRADseq) protocol usingDNAs from four genera extracted from both silica‐dried and herbarium tissue. MethodsDNAs fromDraba,Boechera,Solidago, andIlexwere processed with a ddRADseq protocol. The effects ofDNAdegradation, taxon, and specimen age were assessed. ResultsAlthough taxon, preservation method, and specimen age affected data recovery, large phylogenetically informative data sets were obtained from the majority of samples. DiscussionThese results suggest that herbarium samples can be incorporated into ddRADseq project designs, and that specimen age can be used as a rapid on‐site guide for sample choice. The detailed protocol we provide will allow users to pursue herbarium‐based ddRADseq projects that minimize the expenses associated with fieldwork and sample evaluation. 
    more » « less
  3. Abstract AimThe International Tree‐Ring Data Bank (ITRDB) is the most comprehensive database of tree growth. To evaluate its usefulness and improve its accessibility to the broad scientific community, we aimed to: (a) quantify its biases, (b) assess how well it represents global forests, (c) develop tools to identify priority areas to improve its representativity, and d) make available the corrected database. LocationWorldwide. Time periodContributed datasets between 1974 and 2017. Major taxa studiedTrees. MethodsWe identified and corrected formatting issues in all individual datasets of theITRDB. We then calculated the representativity of theITRDBwith respect to species, spatial coverage, climatic regions, elevations, need for data update, climatic limitations on growth, vascular plant diversity, and associated animal diversity. We combined these metrics into a global Priority Sampling Index (PSI) to highlight ways to improveITRDBrepresentativity. ResultsOur refined dataset provides access to a network of >52 million growth data points worldwide. We found, however, that the database is dominated by trees from forests with low diversity, in semi‐arid climates, coniferous species, and in western North America. Conifers represented 81% of theITRDBand even in well‐sampled areas, broadleaves were poorly represented. OurPSIstressed the need to increase the database diversity in terms of broadleaf species and identified poorly represented regions that require scientific attention. Great gains will be made by increasing research and data sharing in African, Asian, and South American forests. Main conclusionsThe extensive data and coverage of theITRDBshow great promise to address macroecological questions. To achieve this, however, we have to overcome the significant gaps in the representativity of theITRDB. A strategic and organized group effort is required, and we hope the tools and data provided here can guide the efforts to improve this invaluable database. 
    more » « less
  4. ObjectivePatients have a poor understanding of outcomes related to total knee replacement (TKR) surgery, with most patients underestimating the potential benefits and overestimating the risk of complications. In this study, we sought to compare the impacts of descriptive information alone or in combination with an icon array, experience condition (images), or spinner on participants’ preference forTKR. MethodsA total of 648 members of an online arthritis network were randomized to 1 of 4 outcome presentation formats: numeric only, numeric with an icon array, numeric with a set of 50 images, or numeric with a functional spinner. Preferences forTKRwere measured before and immediately after viewing the outcome information using an 11‐point numeric rating scale. Knowledge was assessed by asking participants to report the frequency of each outcome. ResultsParticipants randomized to the icon array, images, and spinner had stronger preferences forTKR(after controlling for baseline preferences) compared to those viewing the numeric only format (P< 0.05 for all mean differences). Knowledge scores were highest in participants randomized to the icon array; however, knowledge did not mediate the association between format and change in preference forTKR. ConclusionDecision support at the point‐of‐care is being increasingly recognized as a vital component of care. Our findings suggest that adding graphic information to descriptive statistics strengthens preferences forTKR. Although experience formats using images may be too complex to use in clinical practice, icon arrays and spinners may be a viable and easily adaptable decision aid to support communication of probabilistic information. 
    more » « less
  5. Abstract Natural history collections (NHCs) are the foundation of historical baselines for assessing anthropogenic impacts on biodiversity. Along these lines, the online mobilization of specimens via digitization—the conversion of specimen data into accessible digital content—has greatly expanded the use of NHC collections across a diversity of disciplines. We broaden the current vision of digitization (Digitization 1.0)—whereby specimens are digitized within NHCs—to include new approaches that rely on digitized products rather than the physical specimen (Digitization 2.0). Digitization 2.0 builds on the data, workflows, and infrastructure produced by Digitization 1.0 to create digital-only workflows that facilitate digitization, curation, and data links, thus returning value to physical specimens by creating new layers of annotation, empowering a global community, and developing automated approaches to advance biodiversity discovery and conservation. These efforts will transform large-scale biodiversity assessments to address fundamental questions including those pertaining to critical issues of global change. 
    more » « less