skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Thursday, February 12 until 1:00 AM ET on Friday, February 13 due to maintenance. We apologize for the inconvenience.


Title: From leaves to labels: Building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2
Abstract PremiseQuantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time consuming and have limited scalability. We present LeafMachine2, a suite of modular machine learning and computer vision tools that can automatically extract a base set of leaf traits from digital plant data sets. MethodsLeafMachine2 was trained on 494,766 manually prepared annotations from 5648 herbarium images obtained from 288 institutions and representing 2663 species; it employs a set of plant component detection and segmentation algorithms to isolate individual leaves, petioles, fruits, flowers, wood samples, buds, and roots. Our landmarking network automatically identifies and measures nine pseudo‐landmarks that occur on most broadleaf taxa. Text labels and barcodes are automatically identified by an archival component detector and are prepared for optical character recognition methods or natural language processing algorithms. ResultsLeafMachine2 can extract trait data from at least 245 angiosperm families and calculate pixel‐to‐metric conversion factors for 26 commonly used ruler types. DiscussionLeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from occluded or overlapping leaves, field images, and non‐archival data sets. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus toward the crucial task of data revision and quality control.  more » « less
Award ID(s):
2217116
PAR ID:
10539232
Author(s) / Creator(s):
;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Applications in Plant Sciences
Volume:
11
Issue:
5
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Reflectance spectroscopy is a rapid method for estimating traits and discriminating species. Spectral libraries from herbarium specimens represent an untapped resource for generating broad phenomic datasets across space, time, and taxa.We conducted a proof‐of‐concept study using trait data and spectra from herbarium specimens up to 179 yr old, alongside data from recently dried and pressed leaves. We validated model accuracy and transferability for trait prediction and taxonomic discrimination.Trait models from herbarium spectra predicted leaf mass per area (LMA) withR2 = 0.94 and %RMSE = 4.86%. Models for LMA prediction were transferable between herbarium and pressed spectra, achievingR2 = 0.88, %RMSE = 8.76% for herbarium to pressed spectra, andR2 = 0.76, %RMSE = 10.5% for the reverse transfer. Discriminant models classified leaf spectra from 25 species with 74% accuracy, and classification probabilities were significantly associated with several herbarium specimen quality metrics.The results validate herbarium spectral data for trait prediction and taxonomic discrimination, and demonstrate that trait modeling can benefit from the complementary use of pressed‐leaf and herbarium‐leaf spectral datasets. These promising advancements help to justify the spectral digitization of plant biodiversity collections and support their application in broad ecological and evolutionary investigations. 
    more » « less
  2. Abstract PremiseMechanistic models using stomatal traits and leaf carbon isotope ratios to reconstruct atmospheric carbon dioxide (CO2) concentrations (ca) are important to understand the Phanerozoic paleoclimate. However, methods for preparing leaf cuticles to measure stomatal traits have not been standardized. MethodsThree people measured the stomatal density and index, guard cell length, guard cell pair width, and pore length of leaves from the sameGinkgo biloba,Quercus alba, andZingiber miogaleaves growing at known CO2levels using four preparation methods: fluorescence on cleared leaves, nail polish, dental putty on fresh leaves, and dental putty on dried leaves. ResultsThere are significant differences between trait measurements from each method. Modeledcacalculations are less sensitive to method than individual traits; however, the choice of assumed carbon isotope fractionation also impacted the accuracy of the results. DiscussionWe show that there is not a significant difference betweencaestimates made using any of the four methods. Further study is needed on the fractionation due to carboxylation of ribulose bisphosphate (RuBP) in individual plant species before use as a paleo‐CO2barometer and to refine estimates based upon widely applied taxa (e.g.,Ginkgo). Finally, we recommend that morphological measurements be made by multiple observers to reduce the effect of individual observational biases. 
    more » « less
  3. Abstract PremiseThe preservation of plant tissues in ethanol is conventionally viewed as problematic. Here, we show that leaf preservation in ethanol combined with proteinase digestion can provide high‐quality DNA extracts. Additionally, as a pretreatment, ethanol can facilitate DNA extraction for recalcitrant samples. MethodsDNA was isolated from leaves preserved with 96% ethanol or from silica‐desiccated leaf samples and herbarium fragments that were pretreated with ethanol. DNA was extracted from herbarium tissues using a special ethanol pretreatment protocol, and these extracts were compared with those obtained using the standard cetyltrimethylammonium bromide (CTAB) method. ResultsDNA extracted from tissue preserved in, or pretreated with, ethanol was less fragmented than DNA from tissues without pretreatment. Adding proteinase digestion to the lysis step increased the amount of DNA obtained from the ethanol‐pretreated tissues. The combination of the ethanol pretreatment with liquid nitrogen freezing and a sorbitol wash prior to cell lysis greatly improved the quality and yield of DNA from the herbarium tissue samples. DiscussionThis study critically reevaluates the consequences of ethanol for plant tissue preservation and expands the utility of pretreatment methods for molecular and phylogenomic studies. 
    more » « less
  4. Abstract PremisePlant trait data are essential for quantifying biodiversity and function across Earth, but these data are challenging to acquire for large studies. Diverse strategies are needed, including the liberation of heritage data locked within specialist literature such as floras and taxonomic monographs. Here we report FloraTraiter, a novel approach using rule‐based natural language processing (NLP) to parse computable trait data from biodiversity literature. MethodsFloraTraiter was implemented through collaborative work between programmers and botanical experts and customized for both online floras and scanned literature. We report a strategy spanning optical character recognition, recognition of taxa, iterative building of traits, and establishing linkages among all of these, as well as curational tools and code for turning these results into standard morphological matrices. ResultsOver 95% of treatment content was successfully parsed for traits with <1% error. Data for more than 700 taxa are reported, including a demonstration of common downstream uses. ConclusionsWe identify strategies, applications, tips, and challenges that we hope will facilitate future similar efforts to produce large open‐source trait data sets for broad community reuse. Largely automated tools like FloraTraiter will be an important addition to the toolkit for assembling trait data at scale. 
    more » « less
  5. null (Ed.)
    High-throughput phenotyping enables the efficient collection of plant trait data at scale. One example involves using imaging systems over key phases of a crop growing season. Although the resulting images provide rich data for statistical analyses of plant phenotypes, image processing for trait extraction is required as a prerequisite. Current methods for trait extraction are mainly based on supervised learning with human labeled data or semisupervised learning with a mixture of human labeled data and unsupervised data. Unfortunately, preparing a sufficiently large training data is both time and labor-intensive. We describe a self-supervised pipeline (KAT4IA) that uses K -means clustering on greenhouse images to construct training data for extracting and analyzing plant traits from an image-based field phenotyping system. The KAT4IA pipeline includes these main steps: self-supervised training set construction, plant segmentation from images of field-grown plants, automatic separation of target plants, calculation of plant traits, and functional curve fitting of the extracted traits. To deal with the challenge of separating target plants from noisy backgrounds in field images, we describe a novel approach using row-cuts and column-cuts on images segmented by transform domain neural network learning, which utilizes plant pixels identified from greenhouse images to train a segmentation model for field images. This approach is efficient and does not require human intervention. Our results show that KAT4IA is able to accurately extract plant pixels and estimate plant heights. 
    more » « less