skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: LeafMachine: Using machine learning to automate leaf trait extraction from digitized herbarium specimens
Premise

Obtaining phenotypic data from herbarium specimens can provide important insights into plant evolution and ecology but requires significant manual effort and time. Here, we present LeafMachine, an application designed to autonomously measure leaves from digitized herbarium specimens or leaf images using an ensemble of machine learning algorithms.

Methods and Results

We trained LeafMachine on 2685 randomly sampled specimens from 138 herbaria and evaluated its performance on specimens spanning 20 diverse families and varying widely in resolution, quality, and layout. LeafMachine successfully extracted at least one leaf measurement from 82.0% and 60.8% of high‐ and low‐resolution images, respectively. Of the unmeasured specimens, only 0.9% and 2.1% of high‐ and low‐resolution images, respectively, were visually judged to have measurable leaves.

Conclusions

This flexible autonomous tool has the potential to vastly increase available trait information from herbarium specimens, and inform a multitude of evolutionary and ecological studies.

 
more » « less
PAR ID:
10456496
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Applications in Plant Sciences
Volume:
8
Issue:
6
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Premise

    Quantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time consuming and have limited scalability. We present LeafMachine2, a suite of modular machine learning and computer vision tools that can automatically extract a base set of leaf traits from digital plant data sets.

    Methods

    LeafMachine2 was trained on 494,766 manually prepared annotations from 5648 herbarium images obtained from 288 institutions and representing 2663 species; it employs a set of plant component detection and segmentation algorithms to isolate individual leaves, petioles, fruits, flowers, wood samples, buds, and roots. Our landmarking network automatically identifies and measures nine pseudo‐landmarks that occur on most broadleaf taxa. Text labels and barcodes are automatically identified by an archival component detector and are prepared for optical character recognition methods or natural language processing algorithms.

    Results

    LeafMachine2 can extract trait data from at least 245 angiosperm families and calculate pixel‐to‐metric conversion factors for 26 commonly used ruler types.

    Discussion

    LeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from occluded or overlapping leaves, field images, and non‐archival data sets. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus toward the crucial task of data revision and quality control.

     
    more » « less
  2. Leaves are the most abundant and visible plant organ, both in the modern world and the fossil record. Identifying foliage to the correct plant family based on leaf architecture is a fundamental botanical skill that is also critical for isolated fossil leaves, which often, especially in the Cenozoic, represent extinct genera and species from extant families. Resources focused on leaf identification are remarkably scarce; however, the situation has improved due to the recent proliferation of digitized herbarium material, live-plant identification applications, and online collections of cleared and fossil leaf images. Nevertheless, the need remains for a specialized image dataset for comparative leaf architecture. We address this gap by assembling an open-access database of 30,252 images of vouchered leaf specimens vetted to family level, primarily of angiosperms, including 26,176 images of cleared and x-rayed leaves representing 354 families and 4,076 of fossil leaves from 48 families. The images maintain original resolution, have user-friendly filenames, and are vetted using APG and modern paleobotanical standards. The cleared and x-rayed leaves include the Jack A. Wolfe and Leo J. Hickey contributions to the National Cleared Leaf Collection and a collection of high-resolution scanned x-ray negatives, housed in the Division of Paleobotany, Department of Paleobiology, Smithsonian National Museum of Natural History, Washington D.C.; and the Daniel I. Axelrod Cleared Leaf Collection, housed at the University of California Museum of Paleontology, Berkeley. The fossil images include a sampling of Late Cretaceous to Eocene paleobotanical sites from the Western Hemisphere held at numerous institutions, especially from Florissant Fossil Beds National Monument (late Eocene, Colorado), as well as several other localities from the Late Cretaceous to Eocene of the Western USA and the early Paleogene of Colombia and southern Argentina. The dataset facilitates new research and education opportunities in paleobotany, comparative leaf architecture, systematics, and machine learning. 
    more » « less
  3. Societal Impact Statement

    Grapevine leaves are emblematic of the strong visual associations people make with plants. Leaf shape is immediately recognizable at a glance, and therefore, this is used to distinguish grape varieties. In an era of computationally enabled machine learning‐derived representations of reality, we can revisit how we view and use the shapes and forms that plants display to understand our relationship with them. Using computational approaches combined with time‐honored methods, we can predict theoretical leaves that are possible, enabling us to understand the genetics, development, and environmental responses of plants in new ways.

    Summary

    Grapevine leaves are a model morphometric system. Sampling over 10,000 leaves using dozens of landmarks, the genetic, developmental, and environmental basis of leaf shape has been studied and a morphospace for the genusVitispredicted. Yet, these representations of leaf shape fail to capture the exquisite features of leaves at high resolution.

    We measure the shapes of 139 grapevine leaves using 1672 pseudo‐landmarks derived from 90 homologous landmarks with Procrustean approaches. From hand traces of the vasculature and blade, we have derived a method to automatically detect landmarks and place pseudo‐landmarks that results in a high‐resolution representation of grapevine leaf shape. Using polynomial models, we create continuous representations of leaf development in 10Vitisspp.

    We visualize a high‐resolution morphospace in which genetic and developmental sources of leaf shape variance are orthogonal to each other. Using classifiers,Vitis vinifera,Vitisspp., rootstock and dissected leaf varieties as well as developmental stages are accurately predicted. Theoretical eigenleaf representations sampled from across the morphospace that we call synthetic leaves can be classified using models.

    By predicting a high‐resolution morphospace and delimiting the boundaries of leaf shapes that can plausibly be produced within the genusVitis, we can sample synthetic leaves with realistic qualities. From an ampelographic perspective, larger numbers of leaves sampled at lower resolution can be projected onto this high‐resolution space, or, synthetic leaves can be used to increase the robustness and accuracy of machine learning classifiers.

     
    more » « less
  4. Abstract Premise

    The preservation of plant tissues in ethanol is conventionally viewed as problematic. Here, we show that leaf preservation in ethanol combined with proteinase digestion can provide high‐quality DNA extracts. Additionally, as a pretreatment, ethanol can facilitate DNA extraction for recalcitrant samples.

    Methods

    DNA was isolated from leaves preserved with 96% ethanol or from silica‐desiccated leaf samples and herbarium fragments that were pretreated with ethanol. DNA was extracted from herbarium tissues using a special ethanol pretreatment protocol, and these extracts were compared with those obtained using the standard cetyltrimethylammonium bromide (CTAB) method.

    Results

    DNA extracted from tissue preserved in, or pretreated with, ethanol was less fragmented than DNA from tissues without pretreatment. Adding proteinase digestion to the lysis step increased the amount of DNA obtained from the ethanol‐pretreated tissues. The combination of the ethanol pretreatment with liquid nitrogen freezing and a sorbitol wash prior to cell lysis greatly improved the quality and yield of DNA from the herbarium tissue samples.

    Discussion

    This study critically reevaluates the consequences of ethanol for plant tissue preservation and expands the utility of pretreatment methods for molecular and phylogenomic studies.

     
    more » « less
  5. Introduction

    Drought detection, spanning from early stress to severe conditions, plays a crucial role in maintaining productivity, facilitating recovery, and preventing plant mortality. While handheld thermal cameras have been widely employed to track changes in leaf water content and stomatal conductance, research on thermal image classification remains limited due mainly to low resolution and blurry images produced by handheld cameras.

    Methods

    In this study, we introduce a computer vision pipeline to enhance the significance of leaf-level thermal images across 27 distinct cotton genotypes cultivated in a greenhouse under progressive drought conditions. Our approach involved employing a customized software pipeline to process raw thermal images, generating leaf masks, and extracting a range of statistically relevant thermal features (e.g., min and max temperature, median value, quartiles, etc.). These features were then utilized to develop machine learning algorithms capable of assessing leaf hydration status and distinguishing between well-watered (WW) and dry-down (DD) conditions.

    Results

    Two different classifiers were trained to predict the plant treatment—random forest and multilayer perceptron neural networks—finding 75% and 78% accuracy in the treatment prediction, respectively. Furthermore, we evaluated the predicted versus true labels based on classic physiological indicators of drought in plants, including volumetric soil water content, leaf water potential, and chlorophyllafluorescence, to provide more insights and possible explanations about the classification outputs.

    Discussion

    Interestingly, mislabeled leaves mostly exhibited notable responses in fluorescence, water uptake from the soil, and/or leaf hydration status. Our findings emphasize the potential of AI-assisted thermal image analysis in enhancing the informative value of common heterogeneous datasets for drought detection. This application suggests widening the experimental settings to be used with deep learning models, designing future investigations into the genotypic variation in plant drought response and potential optimization of water management in agricultural settings.

     
    more » « less