skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Developing a convolutional neural network to classify phytoplankton images collected with an Imaging FlowCytobot along the West Antarctic Peninsula
High-resolution optical imaging systems are quickly becoming universal tools to characterize and quantify microbial diversity in marine ecosystems. Automated detection systems such as convolutional neural networks (CNN) are often developed to identify the immense number of images collected. The goal of our study was to develop a CNN to classify phytoplankton images collected with an Imaging FlowCytobot for the Palmer Antarctica Long-Term Ecological Research project. A medium complexity CNN was developed using a subset of manually-identified images, resulting in an overall accuracy, recall, and f1-score of 93.8%, 93.7%, and 93.7%, respectively. The f1-score dropped to 46.5% when tested on a new random subset of 10,269 images, likely due to highly imbalanced class distributions, high intraclass variance, and interclass morphological similarities of cells in naturally occurring phytoplankton assemblages. Our model was then used to predict taxonomic classifications of phytoplankton at Palmer Station, Antarctica over 2017-2018 and 2018-2019 summer field seasons. The CNN was generally able to capture important seasonal dynamics such as the shift from large centric diatoms to small pennate diatoms in both seasons, which is thought to be driven by increases in glacial meltwater from January to March. Moving forward, we hope to further increase the accuracy of our model to better characterize coastal phytoplankton communities threatened by rapidly changing environmental conditions.  more » « less
Award ID(s):
2026045
PAR ID:
10320300
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
OCEANS 2021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract High-resolution optical imaging systems are quickly becoming universal tools to characterize and quantify microbial diversity in marine ecosystems. Automated classification systems such as convolutional neural networks (CNNs) are often developed to identify species within the immense number of images (e.g., millions per month) collected. The goal of our study was to develop a CNN to classify phytoplankton images collected with an Imaging FlowCytobot for the Palmer Antarctica Long-Term Ecological Research project. A relatively small CNN (~2 million parameters) was developed and trained using a subset of manually identified images, resulting in an overall test accuracy, recall, and f1-score of 93.8, 93.7, and 93.7%, respectively, on a balanced dataset. However, the f1-score dropped to 46.5% when tested on a dataset of 10,269 new images drawn from the natural environment without balancing classes. This decrease is likely due to highly imbalanced class distributions dominated by smaller, less differentiable cells, high intraclass variance, and interclass morphological similarities of cells in naturally occurring phytoplankton assemblages. As a case study to illustrate the value of the model, it was used to predict taxonomic classifications (ranging from genus to class) of phytoplankton at Palmer Station, Antarctica, from late austral spring to early autumn in 2017‐2018 and 2018‐2019. The CNN was generally able to identify important seasonal dynamics such as the shift from large centric diatoms to small pennate diatoms in both years, which is thought to be driven by increases in glacial meltwater from January to March. This shift in particle size distribution has significant implications for the ecology and biogeochemistry of these waters. Moving forward, we hope to further increase the accuracy of our model to better characterize coastal phytoplankton communities threatened by rapidly changing environmental conditions. 
    more » « less
  2. Abstract In coastal West Antarctic Peninsula (WAP) waters, large phytoplankton blooms in late austral spring fuel a highly productive marine ecosystem. However, WAP atmospheric and oceanic temperatures are rising, winter sea ice extent and duration are decreasing, and summer phytoplankton biomass in the northern WAP has decreased and shifted toward smaller cells. To better understand these relationships, an Imaging FlowCytobot was used to characterize seasonal (spring to autumn) phytoplankton community composition and cell size during a low (2017–2018) and high (2018–2019) chlorophyllayear in relation to physical drivers (e.g., sea ice and meteoric water) at Palmer Station, Antarctica. A shorter sea ice season with early rapid retreat resulted in low phytoplankton biomass with a low proportion of diatoms (2017–2018), while a longer sea ice season with late protracted retreat resulted in the opposite (2018–2019). Despite these differences, phytoplankton seasonal succession was similar in both years: (1) a large‐celled centric diatom bloom during spring sea ice retreat; (2) a peak summer phase comprised of mixotrophic cryptophytes with increases in light and postbloom organic matter; and (3) a late summer phase comprised of small (< 20 μm) diatoms and mixed flagellates with increases in wind‐driven nutrient resuspension. In addition, cell diameter decreased from November to April with increases in meteoric water in both years. The tight coupling between sea ice, meltwater, and phytoplankton species composition suggests that continued warming in the WAP will affect phytoplankton seasonal dynamics, and subsequently seasonal food web dynamics. 
    more » « less
  3. An increase in volcanic thermal emissions can indicate subsurface and surface processes that precede, or coincide with, volcanic eruptions. Space-borne infrared sensors can detect hotspots—defined here as localized volcanic thermal emissions—in near-real-time. However, automatic hotspot detection systems are needed to efficiently analyze the large quantities of data produced. While hotspots have been automatically detected for over 20 years with simple thresholding algorithms, new computer vision technologies, such as convolutional neural networks (CNNs), can enable improved detection capabilities. Here we introduce HotLINK: the Hotspot Learning and Identification Network, a CNN trained to detect hotspots with a dataset of −3,800 satellite-based, Visible Infrared Imaging Radiometer Suite (VIIRS) images from Mount Veniaminof and Mount Cleveland volcanoes, Alaska. We find that our model achieves an accuracy of 96% (F1-score 0.92) when evaluated on −1,700 unseen images from the same volcanoes, and 95% (F1-score 0.67) when evaluated on −3,000 images from six additional Alaska volcanoes (Augustine Volcano, Bogoslof Island, Okmok Caldera, Pavlof Volcano, Redoubt Volcano, Shishaldin Volcano). In comparison with an existing threshold-based hotspot detection algorithm, MIROVA (Coppola et al., Geological Society, London, Special Publications, 2016, 426, 181–205), our model detects 22% more hotspots and produces 12% fewer false positives. Additional testing on −700 labeled Moderate Resolution Imaging Spectroradiometer (MODIS) images from Mount Veniaminof demonstrates that our model is applicable to this sensor’s data as well, achieving an accuracy of 98% (F1-score 0.95). We apply HotLINK to 10 years of VIIRS data and 22 years of MODIS data for the eight aforementioned Alaska volcanoes and calculate the radiative power of detected hotspots. From these time series we find that HotLINK accurately characterizes background and eruptive periods, similar to MIROVA, but also detects more subtle warming signals, potentially related to volcanic unrest. We identify three advantages to our model over its predecessors: 1) the ability to detect more subtle volcanic hotspots and produce fewer false positives, especially in daytime images; 2) probabilistic predictions provide a measure of detection confidence; and 3) its transferability, i.e., the successful application to multiple sensors and multiple volcanoes without the need for threshold tuning, suggesting the potential for global application. 
    more » « less
  4. In smart manufacturing, semiconductors play an indispensable role in collecting, processing, and analyzing data, ultimately enabling more agile and productive operations. Given the foundational importance of wafers, the purity of a wafer is essential to maintain the integrity of the overall semiconductor fabrication. This study proposes a novel automated visual inspection (AVI) framework for scrutinizing semiconductor wafers from scratch, capable of identifying defective wafers and pinpointing the location of defects through autonomous data annotation. Initially, this proposed methodology leveraged a texture analysis method known as gray-level co-occurrence matrix (GLCM) that categorized wafer images—captured via a stroboscopic imaging system—into distinct scenarios for high- and low-resolution wafer images. GLCM approaches further allowed for a complete separation of low-resolution wafer images into defective and normal wafer images, as well as the extraction of defect images from defective low-resolution wafer images, which were used for training a convolutional neural network (CNN) model. Consequently, the CNN model excelled in localizing defects on defective low-resolution wafer images, achieving an F1 score—the harmonic mean of precision and recall metrics—exceeding 90.1%. In high-resolution wafer images, a background subtraction technique represented defects as clusters of white points. The quantity of these white points determined the defectiveness and pinpointed locations of defects on high-resolution wafer images. Lastly, the CNN implementation further enhanced performance, robustness, and consistency irrespective of variations in the ratio of white point clusters. This technique demonstrated accuracy in localizing defects on high-resolution wafer images, yielding an F1 score greater than 99.3%. 
    more » « less
  5. Abstract Pollen identification is necessary for several subfields of geology, ecology, and evolutionary biology. However, the existing methods for pollen identification are laborious, time-consuming, and require highly skilled scientists. Therefore, there is a pressing need for an automated and accurate system for pollen identification, which can be beneficial for both basic research and applied issues such as identifying airborne allergens. In this study, we propose a deep learning (DL) approach to classify pollen grains in the Great Basin Desert, Nevada, USA. Our dataset consisted of 10,000 images of 40 pollen species. To mitigate the limitations imposed by the small volume of our training dataset, we conducted an in-depth comparative analysis of numerous pre-trained Convolutional Neural Network (CNN) architectures utilizing transfer learning methodologies. Simultaneously, we developed and incorporated an innovative CNN model, serving to augment our exploration and optimization of data modeling strategies. We applied different architectures of well-known pre-trained deep CNN models, including AlexNet, VGG-16, MobileNet-V2, ResNet (18, 34, and 50, 101), ResNeSt (50, 101), SE-ResNeXt, and Vision Transformer (ViT), to uncover the most promising modeling approach for the classification of pollen grains in the Great Basin. To evaluate the performance of the pre-trained deep CNN models, we measured accuracy, precision, F1-Score, and recall. Our results showed that the ResNeSt-110 model achieved the best performance, with an accuracy of 97.24%, precision of 97.89%, F1-Score of 96.86%, and recall of 97.13%. Our results also revealed that transfer learning models can deliver better and faster image classification results compared to traditional CNN models built from scratch. The proposed method can potentially benefit various fields that rely on efficient pollen identification. This study demonstrates that DL approaches can improve the accuracy and efficiency of pollen identification, and it provides a foundation for further research in the field. 
    more » « less