Summary Reflectance spectroscopy is a rapid method for estimating traits and discriminating species. Spectral libraries from herbarium specimens represent an untapped resource for generating broad phenomic datasets across space, time, and taxa.We conducted a proof‐of‐concept study using trait data and spectra from herbarium specimens up to 179 yr old, alongside data from recently dried and pressed leaves. We validated model accuracy and transferability for trait prediction and taxonomic discrimination.Trait models from herbarium spectra predicted leaf mass per area (LMA) withR2 = 0.94 and %RMSE = 4.86%. Models for LMA prediction were transferable between herbarium and pressed spectra, achievingR2 = 0.88, %RMSE = 8.76% for herbarium to pressed spectra, andR2 = 0.76, %RMSE = 10.5% for the reverse transfer. Discriminant models classified leaf spectra from 25 species with 74% accuracy, and classification probabilities were significantly associated with several herbarium specimen quality metrics.The results validate herbarium spectral data for trait prediction and taxonomic discrimination, and demonstrate that trait modeling can benefit from the complementary use of pressed‐leaf and herbarium‐leaf spectral datasets. These promising advancements help to justify the spectral digitization of plant biodiversity collections and support their application in broad ecological and evolutionary investigations.
more »
« less
From leaves to labels: Building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2
Abstract PremiseQuantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time consuming and have limited scalability. We present LeafMachine2, a suite of modular machine learning and computer vision tools that can automatically extract a base set of leaf traits from digital plant data sets. MethodsLeafMachine2 was trained on 494,766 manually prepared annotations from 5648 herbarium images obtained from 288 institutions and representing 2663 species; it employs a set of plant component detection and segmentation algorithms to isolate individual leaves, petioles, fruits, flowers, wood samples, buds, and roots. Our landmarking network automatically identifies and measures nine pseudo‐landmarks that occur on most broadleaf taxa. Text labels and barcodes are automatically identified by an archival component detector and are prepared for optical character recognition methods or natural language processing algorithms. ResultsLeafMachine2 can extract trait data from at least 245 angiosperm families and calculate pixel‐to‐metric conversion factors for 26 commonly used ruler types. DiscussionLeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from occluded or overlapping leaves, field images, and non‐archival data sets. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus toward the crucial task of data revision and quality control.
more »
« less
- Award ID(s):
- 2217116
- PAR ID:
- 10539232
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Applications in Plant Sciences
- Volume:
- 11
- Issue:
- 5
- ISSN:
- 2168-0450
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)High-throughput phenotyping enables the efficient collection of plant trait data at scale. One example involves using imaging systems over key phases of a crop growing season. Although the resulting images provide rich data for statistical analyses of plant phenotypes, image processing for trait extraction is required as a prerequisite. Current methods for trait extraction are mainly based on supervised learning with human labeled data or semisupervised learning with a mixture of human labeled data and unsupervised data. Unfortunately, preparing a sufficiently large training data is both time and labor-intensive. We describe a self-supervised pipeline (KAT4IA) that uses K -means clustering on greenhouse images to construct training data for extracting and analyzing plant traits from an image-based field phenotyping system. The KAT4IA pipeline includes these main steps: self-supervised training set construction, plant segmentation from images of field-grown plants, automatic separation of target plants, calculation of plant traits, and functional curve fitting of the extracted traits. To deal with the challenge of separating target plants from noisy backgrounds in field images, we describe a novel approach using row-cuts and column-cuts on images segmented by transform domain neural network learning, which utilizes plant pixels identified from greenhouse images to train a segmentation model for field images. This approach is efficient and does not require human intervention. Our results show that KAT4IA is able to accurately extract plant pixels and estimate plant heights.more » « less
-
Premise of the StudyPhenological annotation models computed on large‐scale herbarium data sets were developed and tested in this study. MethodsHerbarium specimens represent a significant resource with which to study plant phenology. Nevertheless, phenological annotation of herbarium specimens is time‐consuming, requires substantial human investment, and is difficult to mobilize at large taxonomic scales. We created and evaluated new methods based on deep learning techniques to automate annotation of phenological stages and tested these methods on four herbarium data sets representing temperate, tropical, and equatorial American floras. ResultsDeep learning allowed correct detection of fertile material with an accuracy of 96.3%. Accuracy was slightly decreased for finer‐scale information (84.3% for flower and 80.5% for fruit detection). DiscussionThe method described has the potential to allow fine‐grained phenological annotation of herbarium specimens at large ecological scales. Deeper investigation regarding the taxonomic scalability of this approach is needed.more » « less
-
This study describes the evaluation of a range of approaches to semantic segmentation of hyperspectral images of sorghum plants, classifying each pixel as either nonplant or belonging to one of the three organ types (leaf, stalk, panicle). While many current methods for segmentation focus on separating plant pixels from background, organ-specific segmentation makes it feasible to measure a wider range of plant properties. Manually scored training data for a set of hyperspectral images collected from a sorghum association population was used to train and evaluate a set of supervised classification models. Many algorithms show acceptable accuracy for this classification task. Algorithms trained on sorghum data are able to accurately classify maize leaves and stalks, but fail to accurately classify maize reproductive organs which are not directly equivalent to sorghum panicles. Trait measurements extracted from semantic segmentation of sorghum organs can be used to identify both genes known to be controlling variation in a previously measured phenotypes (e.g., panicle size and plant height) as well as identify signals for genes controlling traits not previously quantified in this population (e.g., stalk/leaf ratio). Organ level semantic segmentation provides opportunities to identify genes controlling variation in a wide range of morphological phenotypes in sorghum, maize, and other related grain crops.more » « less
-
Abstract PremisePlant trait data are essential for quantifying biodiversity and function across Earth, but these data are challenging to acquire for large studies. Diverse strategies are needed, including the liberation of heritage data locked within specialist literature such as floras and taxonomic monographs. Here we report FloraTraiter, a novel approach using rule‐based natural language processing (NLP) to parse computable trait data from biodiversity literature. MethodsFloraTraiter was implemented through collaborative work between programmers and botanical experts and customized for both online floras and scanned literature. We report a strategy spanning optical character recognition, recognition of taxa, iterative building of traits, and establishing linkages among all of these, as well as curational tools and code for turning these results into standard morphological matrices. ResultsOver 95% of treatment content was successfully parsed for traits with <1% error. Data for more than 700 taxa are reported, including a demonstration of common downstream uses. ConclusionsWe identify strategies, applications, tips, and challenges that we hope will facilitate future similar efforts to produce large open‐source trait data sets for broad community reuse. Largely automated tools like FloraTraiter will be an important addition to the toolkit for assembling trait data at scale.more » « less
An official website of the United States government

