skip to main content


Title: Evaluation of Mask R-CNN Model for Counting Reproductive Structures of Six Plant Species
Phenology––the timing of life-history events––is a key trait for understanding responses of organisms to climate. The digitization and online mobilization of herbarium specimens is rapidly advancing our understanding of plant phenological response to climate and climatic change. The current common practice of manually harvesting data from individual specimens greatly restricts our ability to scale data collection to entire collections. Recent investigations have demonstrated that machine-learning models can facilitate data collection from herbarium specimens. However, present attempts have focused largely on simplistic binary coding of reproductive phenology (e.g., flowering or not). Here, we use crowd-sourced phenological data of numbers of buds, flowers, and fruits of more than 3000 specimens of six common wildflower species of the eastern United States (Anemone canadensis, A. hepatica, A. quinquefolia, Trillium erectum, T. grandiflorum, and T. undulatum} to train a model using Mask R-CNN to segment and count phenological features. A single global model was able to automate the binary coding of reproductive stage with greater than 90% accuracy. Segmenting and counting features were also successful, but accuracy varied with phenological stage and taxon. Counting buds was significantly more accurate than flowers or fruits. Moreover, botanical experts provided more reliable data than either crowd-sourcers or our Mask R-CNN model, highlighting the importance of high-quality human training data. Finally, we also demonstrated the transferability of our model to automated phenophase detection and counting of the three Trillium species, which have large and conspicuously-shaped reproductive organs. These results highlight the promise of our two-phase crowd-sourcing and machine-learning pipeline to segment and count reproductive features of herbarium specimens, providing high-quality data with which to study responses of plants to ongoing climatic change.  more » « less
Award ID(s):
2101884 2105903 1802209 1754584
NSF-PAR ID:
10354825
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Environmental Data Initiative
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Premise

    Herbarium specimens represent an outstanding source of material with which to study plant phenological changes in response to climate change. The fine‐scale phenological annotation of such specimens is nevertheless highly time consuming and requires substantial human investment and expertise, which are difficult to rapidly mobilize.

    Methods

    We trained and evaluated new deep learning models to automate the detection, segmentation, and classification of four reproductive structures ofStreptanthus tortuosus(flower buds, flowers, immature fruits, and mature fruits). We used a training data set of 21 digitized herbarium sheets for which the position and outlines of 1036 reproductive structures were annotated manually. We adjusted the hyperparameters of amask R‐CNN(regional convolutional neural network) to this specific task and evaluated the resulting trained models for their ability to count reproductive structures and estimate their size.

    Results

    The main outcome of our study is that the performance of detection and segmentation can vary significantly with: (i) the type of annotations used for training, (ii) the type of reproductive structures, and (iii) the size of the reproductive structures. In the case ofStreptanthus tortuosus, the method can provide quite accurate estimates (77.9% of cases) of the number of reproductive structures, which is better estimated for flowers than for immature fruits and buds. The size estimation results are also encouraging, showing a difference of only a few millimeters between the predicted and actual sizes of buds and flowers.

    Discussion

    This method has great potential for automating the analysis of reproductive structures in high‐resolution images of herbarium sheets. Deeper investigations regarding the taxonomic scalability of this approach and its potential improvement will be conducted in future work.

     
    more » « less
  2. Machine learning (ML) can accelerate the extraction of phenological data from herbarium specimens; however, no studies have assessed whether ML-derived phenological data can be used reliably to evaluate ecological patterns. In this study, 709 herbarium specimens representing a widespread annual herb, Streptanthus tortuosus, were scored both manually by human observers and by a mask R-CNN object detection model to (1) evaluate the concordance between ML and manually-derived phenological data and (2) determine whether ML-derived data can be used to reliably assess phenological patterns. The ML model generally underestimated the number of reproductive structures present on each specimen; however, when these counts were used to provide a quantitative estimate of the phenological stage of plants on a given sheet (i.e., the phenological index or PI), the ML and manually-derived PI’s were highly concordant. Moreover, herbarium specimen age had no effect on the estimated PI of a given sheet. Finally, including ML-derived PIs as predictor variables in phenological models produced estimates of the phenological sensitivity of this species to climate, temporal shifts in flowering time, and the rate of phenological progression that are indistinguishable from those produced by models based on data provided by human observers. This study demonstrates that phenological data extracted using machine learning can be used reliably to estimate the phenological stage of herbarium specimens and to detect phenological patterns. 
    more » « less
  3. Plant phenology has been shifting dramatically in response to climate change, a shift that may have significant and widespread ecological consequences. Of particular concern are tropical biomes, which represent the most biodiverse and imperiled regions of the world. However, compared to temperate floras, we know little about phenological responses of tropical plants because long-term observational datasets from the tropics are sparse. Herbarium specimens have greatly increased our phenological knowledge in temperate regions, but similar data have been underutilized in the tropics and their suitability for this purpose has not been broadly validated. Here, we compare phenological estimates derived from field observational data (i.e., plot surveys) and herbarium specimens at various spatial and taxonomic scales to determine whether specimens can provide accurate estimations of reproductive timing and its spatial variation. Here we demonstrate that phenological estimates from field observations and herbarium specimens coincide well. Fewer than 5% of the species exhibited significant differences between flowering periods inferred from field observations versus specimens regardless of spatial aggregation. In contrast to studies based on field records, herbarium specimens sampled much larger geographic and climatic ranges, as has been documented previously for temperate plants, and effectively captured phenological responses across varied environments. Herbarium specimens are verified to be a vital resource for closing the gap in our phenological knowledge of tropical systems. Tropical plant reproductive phenology inferred from herbarium records are widely congruent with field observations, suggesting that they can (and should) be used to investigate phenological variation and their associated environmental cues more broadly across tropical biomes. 
    more » « less
  4. Summary

    Urbanization can affect the timing of plant reproduction (i.e. flowering and fruiting) and associated ecosystem processes. However, our knowledge of how plant phenology responds to urbanization and its associated environmental changes is limited.

    Herbaria represent an important, but underutilized source of data for investigating this question. We harnessed phenological data from herbarium specimens representing 200 plant species collected across 120 yr from the eastern US to investigate the spatiotemporal effects of urbanization on flowering and fruiting phenology and frost risk (i.e. time between the last frost date and flowering).

    Effects of urbanization on plant reproductive phenology varied significantly in direction and magnitude across species ranges. Increased urbanization led to earlier flowering in colder and wetter regions and delayed fruiting in regions with wetter spring conditions. Frost risk was elevated with increased urbanization in regions with colder and wetter spring conditions.

    Our study demonstrates that predictions of phenological change and its associated impacts must account for both climatic and human effects, which are context dependent and do not necessarily coincide. We must move beyond phenological models that only incorporate temperature variables and consider multiple environmental factors and their interactions when estimating plant phenology, especially at larger spatial and taxonomic scales.

     
    more » « less
  5. Premise of the Study

    A novel method of estimating phenology of herbarium specimens was developed to facilitate more precise determination of plant phenological responses to explanatory variables (e.g., climate).

    Methods and Results

    Simulated specimen data sets were used to compare the precision of phenological models using the new method and two common, alternative methods (flower presence/absence and ≥50% flowers present). The new “estimated phenophase” method was more precise and extracted a greater number of significant species‐level relationships; however, this method only slightly outperformed the simple “binary” (e.g., flowers present/absent) method.

    Conclusions

    The new method enables estimation of phenological trends with greater precision. However, when time and resources are limited, a presence/absence method may offer comparable results at lower cost. Using a more restrictive approach, such as only including specimens in a certain phenophase, is not advised given the detrimental effect of decreased sample size on resulting models.

     
    more » « less