skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: From leaves to labels: Building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2
Abstract PremiseQuantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time consuming and have limited scalability. We present LeafMachine2, a suite of modular machine learning and computer vision tools that can automatically extract a base set of leaf traits from digital plant data sets. MethodsLeafMachine2 was trained on 494,766 manually prepared annotations from 5648 herbarium images obtained from 288 institutions and representing 2663 species; it employs a set of plant component detection and segmentation algorithms to isolate individual leaves, petioles, fruits, flowers, wood samples, buds, and roots. Our landmarking network automatically identifies and measures nine pseudo‐landmarks that occur on most broadleaf taxa. Text labels and barcodes are automatically identified by an archival component detector and are prepared for optical character recognition methods or natural language processing algorithms. ResultsLeafMachine2 can extract trait data from at least 245 angiosperm families and calculate pixel‐to‐metric conversion factors for 26 commonly used ruler types. DiscussionLeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from occluded or overlapping leaves, field images, and non‐archival data sets. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus toward the crucial task of data revision and quality control.  more » « less
Award ID(s):
2217116
PAR ID:
10539232
Author(s) / Creator(s):
;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Applications in Plant Sciences
Volume:
11
Issue:
5
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Reflectance spectroscopy is a rapid method for estimating traits and discriminating species. Spectral libraries from herbarium specimens represent an untapped resource for generating broad phenomic datasets across space, time, and taxa.We conducted a proof‐of‐concept study using trait data and spectra from herbarium specimens up to 179 yr old, alongside data from recently dried and pressed leaves. We validated model accuracy and transferability for trait prediction and taxonomic discrimination.Trait models from herbarium spectra predicted leaf mass per area (LMA) withR2 = 0.94 and %RMSE = 4.86%. Models for LMA prediction were transferable between herbarium and pressed spectra, achievingR2 = 0.88, %RMSE = 8.76% for herbarium to pressed spectra, andR2 = 0.76, %RMSE = 10.5% for the reverse transfer. Discriminant models classified leaf spectra from 25 species with 74% accuracy, and classification probabilities were significantly associated with several herbarium specimen quality metrics.The results validate herbarium spectral data for trait prediction and taxonomic discrimination, and demonstrate that trait modeling can benefit from the complementary use of pressed‐leaf and herbarium‐leaf spectral datasets. These promising advancements help to justify the spectral digitization of plant biodiversity collections and support their application in broad ecological and evolutionary investigations. 
    more » « less
  2. Premise of the StudyPhenological annotation models computed on large‐scale herbarium data sets were developed and tested in this study. MethodsHerbarium specimens represent a significant resource with which to study plant phenology. Nevertheless, phenological annotation of herbarium specimens is time‐consuming, requires substantial human investment, and is difficult to mobilize at large taxonomic scales. We created and evaluated new methods based on deep learning techniques to automate annotation of phenological stages and tested these methods on four herbarium data sets representing temperate, tropical, and equatorial American floras. ResultsDeep learning allowed correct detection of fertile material with an accuracy of 96.3%. Accuracy was slightly decreased for finer‐scale information (84.3% for flower and 80.5% for fruit detection). DiscussionThe method described has the potential to allow fine‐grained phenological annotation of herbarium specimens at large ecological scales. Deeper investigation regarding the taxonomic scalability of this approach is needed. 
    more » « less
  3. null (Ed.)
    High-throughput phenotyping enables the efficient collection of plant trait data at scale. One example involves using imaging systems over key phases of a crop growing season. Although the resulting images provide rich data for statistical analyses of plant phenotypes, image processing for trait extraction is required as a prerequisite. Current methods for trait extraction are mainly based on supervised learning with human labeled data or semisupervised learning with a mixture of human labeled data and unsupervised data. Unfortunately, preparing a sufficiently large training data is both time and labor-intensive. We describe a self-supervised pipeline (KAT4IA) that uses K -means clustering on greenhouse images to construct training data for extracting and analyzing plant traits from an image-based field phenotyping system. The KAT4IA pipeline includes these main steps: self-supervised training set construction, plant segmentation from images of field-grown plants, automatic separation of target plants, calculation of plant traits, and functional curve fitting of the extracted traits. To deal with the challenge of separating target plants from noisy backgrounds in field images, we describe a novel approach using row-cuts and column-cuts on images segmented by transform domain neural network learning, which utilizes plant pixels identified from greenhouse images to train a segmentation model for field images. This approach is efficient and does not require human intervention. Our results show that KAT4IA is able to accurately extract plant pixels and estimate plant heights. 
    more » « less
  4. Abstract PremiseMechanistic models using stomatal traits and leaf carbon isotope ratios to reconstruct atmospheric carbon dioxide (CO2) concentrations (ca) are important to understand the Phanerozoic paleoclimate. However, methods for preparing leaf cuticles to measure stomatal traits have not been standardized. MethodsThree people measured the stomatal density and index, guard cell length, guard cell pair width, and pore length of leaves from the sameGinkgo biloba,Quercus alba, andZingiber miogaleaves growing at known CO2levels using four preparation methods: fluorescence on cleared leaves, nail polish, dental putty on fresh leaves, and dental putty on dried leaves. ResultsThere are significant differences between trait measurements from each method. Modeledcacalculations are less sensitive to method than individual traits; however, the choice of assumed carbon isotope fractionation also impacted the accuracy of the results. DiscussionWe show that there is not a significant difference betweencaestimates made using any of the four methods. Further study is needed on the fractionation due to carboxylation of ribulose bisphosphate (RuBP) in individual plant species before use as a paleo‐CO2barometer and to refine estimates based upon widely applied taxa (e.g.,Ginkgo). Finally, we recommend that morphological measurements be made by multiple observers to reduce the effect of individual observational biases. 
    more » « less
  5. This study describes the evaluation of a range of approaches to semantic segmentation of hyperspectral images of sorghum plants, classifying each pixel as either nonplant or belonging to one of the three organ types (leaf, stalk, panicle). While many current methods for segmentation focus on separating plant pixels from background, organ-specific segmentation makes it feasible to measure a wider range of plant properties. Manually scored training data for a set of hyperspectral images collected from a sorghum association population was used to train and evaluate a set of supervised classification models. Many algorithms show acceptable accuracy for this classification task. Algorithms trained on sorghum data are able to accurately classify maize leaves and stalks, but fail to accurately classify maize reproductive organs which are not directly equivalent to sorghum panicles. Trait measurements extracted from semantic segmentation of sorghum organs can be used to identify both genes known to be controlling variation in a previously measured phenotypes (e.g., panicle size and plant height) as well as identify signals for genes controlling traits not previously quantified in this population (e.g., stalk/leaf ratio). Organ level semantic segmentation provides opportunities to identify genes controlling variation in a wide range of morphological phenotypes in sorghum, maize, and other related grain crops. 
    more » « less