skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research
Abstract Machine learning (ML) has great potential to drive scientific discovery by harvesting data from images of herbarium specimens—preserved plant material curated in natural history collections—but ML techniques have only recently been applied to this rich resource. ML has particularly strong prospects for the study of plant phenological events such as growth and reproduction. As a major indicator of climate change, driver of ecological processes, and critical determinant of plant fitness, plant phenology is an important frontier for the application of ML techniques for science and society. In the present article, we describe a generalized, modular ML workflow for extracting phenological data from images of herbarium specimens, and we discuss the advantages, limitations, and potential future improvements of this workflow. Strategic research and investment in specimen-based ML methods, along with the aggregation of herbarium specimen data, may give rise to a better understanding of life on Earth.  more » « less
Award ID(s):
1754584 1802209 1902064 1902078
PAR ID:
10164955
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
BioScience
ISSN:
0006-3568
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning (ML) can accelerate the extraction of phenological data from herbarium specimens; however, no studies have assessed whether ML-derived phenological data can be used reliably to evaluate ecological patterns. In this study, 709 herbarium specimens representing a widespread annual herb, Streptanthus tortuosus, were scored both manually by human observers and by a mask R-CNN object detection model to (1) evaluate the concordance between ML and manually-derived phenological data and (2) determine whether ML-derived data can be used to reliably assess phenological patterns. The ML model generally underestimated the number of reproductive structures present on each specimen; however, when these counts were used to provide a quantitative estimate of the phenological stage of plants on a given sheet (i.e., the phenological index or PI), the ML and manually-derived PI’s were highly concordant. Moreover, herbarium specimen age had no effect on the estimated PI of a given sheet. Finally, including ML-derived PIs as predictor variables in phenological models produced estimates of the phenological sensitivity of this species to climate, temporal shifts in flowering time, and the rate of phenological progression that are indistinguishable from those produced by models based on data provided by human observers. This study demonstrates that phenological data extracted using machine learning can be used reliably to estimate the phenological stage of herbarium specimens and to detect phenological patterns. 
    more » « less
  2. Premise of the Study

    Phenological annotation models computed on large‐scale herbarium data sets were developed and tested in this study.

    Methods

    Herbarium specimens represent a significant resource with which to study plant phenology. Nevertheless, phenological annotation of herbarium specimens is time‐consuming, requires substantial human investment, and is difficult to mobilize at large taxonomic scales. We created and evaluated new methods based on deep learning techniques to automate annotation of phenological stages and tested these methods on four herbarium data sets representing temperate, tropical, and equatorial American floras.

    Results

    Deep learning allowed correct detection of fertile material with an accuracy of 96.3%. Accuracy was slightly decreased for finer‐scale information (84.3% for flower and 80.5% for fruit detection).

    Discussion

    The method described has the potential to allow fine‐grained phenological annotation of herbarium specimens at large ecological scales. Deeper investigation regarding the taxonomic scalability of this approach is needed.

     
    more » « less
  3. Plant phenology has been shifting dramatically in response to climate change, a shift that may have significant and widespread ecological consequences. Of particular concern are tropical biomes, which represent the most biodiverse and imperiled regions of the world. However, compared to temperate floras, we know little about phenological responses of tropical plants because long-term observational datasets from the tropics are sparse. Herbarium specimens have greatly increased our phenological knowledge in temperate regions, but similar data have been underutilized in the tropics and their suitability for this purpose has not been broadly validated. Here, we compare phenological estimates derived from field observational data (i.e., plot surveys) and herbarium specimens at various spatial and taxonomic scales to determine whether specimens can provide accurate estimations of reproductive timing and its spatial variation. Here we demonstrate that phenological estimates from field observations and herbarium specimens coincide well. Fewer than 5% of the species exhibited significant differences between flowering periods inferred from field observations versus specimens regardless of spatial aggregation. In contrast to studies based on field records, herbarium specimens sampled much larger geographic and climatic ranges, as has been documented previously for temperate plants, and effectively captured phenological responses across varied environments. Herbarium specimens are verified to be a vital resource for closing the gap in our phenological knowledge of tropical systems. Tropical plant reproductive phenology inferred from herbarium records are widely congruent with field observations, suggesting that they can (and should) be used to investigate phenological variation and their associated environmental cues more broadly across tropical biomes. 
    more » « less
  4. Forecasting the impacts of changing climate on the phenology of plant populations is essential for anticipating and managing potential ecological disruptions to biotic communities. Herbarium specimens enable assessments of plant phenology across broad spatiotemporal scales. However, specimens are collected opportunistically, and it is unclear whether their collection dates – used as proxies of phenological stages – are closest to the onset, peak, or termination of a phenophase, or whether sampled individuals represent early, average, or late occurrences in their populations. Despite this, no studies have assessed whether these uncertainties limit the utility of herbarium specimens for estimating the onset and termination of a phenophase. Using simulated data mimicking such uncertainties, we evaluated the accuracy with which the onset and termination of population‐level phenological displays (in this case, of flowering) can be predicted from natural‐history collections data (controlling for biases in collector behavior), and how the duration, variability, and responsiveness to climate of the flowering period of a species and temporal collection biases influence model accuracy. Estimates of population‐level onset and termination were highly accurate for a wide range of simulated species' attributes, but accuracy declined among species with longer individual‐level flowering duration and when there were temporal biases in sample collection, as is common among the earliest and latest‐flowering species. The amount of data required to model population‐level phenological displays is not impractical to obtain; model accuracy declined by less than 1 day as sample sizes rose from 300 to 1000 specimens. Our analyses of simulated data indicate that, absent pervasive biases in collection and if the climate conditions that affect phenological timing are correctly identified, specimen data can predict the onset, termination, and duration of a population's flowering period with similar accuracy to estimates of median flowering time that are commonplace in the literature.

     
    more » « less
  5. null (Ed.)
    Abstract Background and Aims Fruiting remains under-represented in long-term phenology records, relative to leaf and flower phenology. Herbarium specimens and historical field notes can fill this gap, but selecting and synthesizing these records for modern-day comparison requires an understanding of whether different historical data sources contain similar information, and whether similar, but not equivalent, fruiting metrics are comparable with one another. Methods For 67 fleshy-fruited plant species, we compared observations of fruiting phenology made by Henry David Thoreau in Concord, Massachusetts (1850s), with phenology data gathered from herbarium specimens collected across New England (mid-1800s to 2000s). To identify whether fruiting times and the order of fruiting among species are similar between datasets, we compared dates of first, peak and last observed fruiting (recorded by Thoreau), and earliest, mean and latest specimen (collected from herbarium records), as well as fruiting durations. Key Results On average, earliest herbarium specimen dates were earlier than first fruiting dates observed by Thoreau; mean specimen dates were similar to Thoreau’s peak fruiting dates; latest specimen dates were later than Thoreau’s last fruiting dates; and durations of fruiting captured by herbarium specimens were longer than durations of fruiting observed by Thoreau. All metrics of fruiting phenology except duration were significantly, positively correlated within (r: 0.69–0.88) and between (r: 0.59–0.85) datasets. Conclusions Strong correlations in fruiting phenology between Thoreau’s observations and data from herbaria suggest that field and herbarium methods capture similar broad-scale phenological information, including relative fruiting times among plant species in New England. Differences in the timing of first, last and duration of fruiting suggest that historical datasets collected with different methods, scales and metrics may not be comparable when exact timing is important. Researchers should strongly consider matching methodology when selecting historical records of fruiting phenology for present-day comparisons. 
    more » « less