Abstract Underwater imaging enables nondestructive plankton sampling at frequencies, durations, and resolutions unattainable by traditional methods. These systems necessitate automated processes to identify organisms efficiently. Early underwater image processing used a standard approach: binarizing images to segment targets, then integrating deep learning models for classification. While intuitive, this infrastructure has limitations in handling high concentrations of biotic and abiotic particles, rapid changes in dominant taxa, and highly variable target sizes. To address these challenges, we introduce a new framework that starts with a scene classifier to capture large within‐image variation, such as disparities in the layout of particles and dominant taxa. After scene classification, scene‐specific Mask regional convolutional neural network (Mask R‐CNN) models are trained to separate target objects into different groups. The procedure allows information to be extracted from different image types, while minimizing potential bias for commonly occurring features. Using in situ coastal plankton images, we compared the scene‐specific models to the Mask R‐CNN model encompassing all scene categories as a single full model. Results showed that the scene‐specific approach outperformed the full model by achieving a 20% accuracy improvement in complex noisy images. The full model yielded counts that were up to 78% lower than those enumerated by the scene‐specific model for some small‐sized plankton groups. We further tested the framework on images from a benthic video camera and an imaging sonar system with good results. The integration of scene classification, which groups similar images together, can improve the accuracy of detection and classification for complex marine biological images.
more »
« less
Beyond transfer learning: Leveraging ancillary images in automated classification of plankton
Abstract We assess whether a supervised machine learning algorithm, specifically a convolutional neural network (CNN), achieves higher accuracy on planktonic image classification when including non‐plankton and ancillary plankton during the training procedure. We focus on the case of optimizing the CNN for a single planktonic image source, while considering ancillary images to be plankton images from other instruments. We conducted two sets of experiments with three different types of plankton images (from aZooglider, Underwater Vision Profiler 5, and Zooscan), and our results held across all three image types. First, we considered whether single‐stage transfer learning using non‐plankton images was beneficial. For this assessment, we used ImageNet images and the 2015 ImageNet contest‐winning model, ResNet‐152. We found increased accuracy using a ResNet‐152 model pretrained on ImageNet, provided the entire network was retrained rather than retraining only the fully connected layers. Next, we combined all three plankton image types into a single dataset with 3.3 million images (despite their differences in contrast, resolution, and pixel pitch) and conducted a multistage transfer learning assessment. We executed a transfer learning stage from ImageNet to the merged ancillary plankton dataset, then a second transfer learning stage from that merged plankton model to a single instrument dataset. We found that multistage transfer learning resulted in additional accuracy gains. These results should have generality for other image classification tasks.
more »
« less
- PAR ID:
- 10543924
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Limnology and Oceanography: Methods
- Volume:
- 22
- Issue:
- 12
- ISSN:
- 1541-5856
- Format(s):
- Medium: X Size: p. 943-952
- Size(s):
- p. 943-952
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Pollen identification is necessary for several subfields of geology, ecology, and evolutionary biology. However, the existing methods for pollen identification are laborious, time-consuming, and require highly skilled scientists. Therefore, there is a pressing need for an automated and accurate system for pollen identification, which can be beneficial for both basic research and applied issues such as identifying airborne allergens. In this study, we propose a deep learning (DL) approach to classify pollen grains in the Great Basin Desert, Nevada, USA. Our dataset consisted of 10,000 images of 40 pollen species. To mitigate the limitations imposed by the small volume of our training dataset, we conducted an in-depth comparative analysis of numerous pre-trained Convolutional Neural Network (CNN) architectures utilizing transfer learning methodologies. Simultaneously, we developed and incorporated an innovative CNN model, serving to augment our exploration and optimization of data modeling strategies. We applied different architectures of well-known pre-trained deep CNN models, including AlexNet, VGG-16, MobileNet-V2, ResNet (18, 34, and 50, 101), ResNeSt (50, 101), SE-ResNeXt, and Vision Transformer (ViT), to uncover the most promising modeling approach for the classification of pollen grains in the Great Basin. To evaluate the performance of the pre-trained deep CNN models, we measured accuracy, precision, F1-Score, and recall. Our results showed that the ResNeSt-110 model achieved the best performance, with an accuracy of 97.24%, precision of 97.89%, F1-Score of 96.86%, and recall of 97.13%. Our results also revealed that transfer learning models can deliver better and faster image classification results compared to traditional CNN models built from scratch. The proposed method can potentially benefit various fields that rely on efficient pollen identification. This study demonstrates that DL approaches can improve the accuracy and efficiency of pollen identification, and it provides a foundation for further research in the field.more » « less
-
During the 1950s, the Gros Michel species of bananas were nearly wiped out by the incurable Fusarium Wilt, also known as Panama Disease. Originating in Southeast Asia, Fusarium Wilt is a banana pandemic that has been threatening the multi-billion-dollar banana industry worldwide. The disease is caused by a fungus that spreads rapidly throughout the soil and into the roots of banana plants. Currently, the only way to stop the spread of this disease is for farmers to manually inspect and remove infected plants as quickly as possible, which is a time-consuming process. The main purpose of this study is to build a deep Convolutional Neural Network (CNN) using a transfer learning approach to rapidly identify Fusarium wilt infections on banana crop leaves. We chose to use the ResNet50 architecture as the base CNN model for our transfer learning approach owing to its remarkable performance in image classification, which was demonstrated through its victory in the ImageNet competition. After its initial training and fine-tuning on a data set consisting of 600 healthy and diseased images, the CNN model achieved near-perfect accuracy of 0.99 along with a loss of 0.46 and was fine-tuned to adapt the ResNet base model. ResNet50’s distinctive residual block structure could be the reason behind these results. To evaluate this CNN model, 500 test images, consisting of 250 diseased and healthy banana leaf images, were classified by the model. The deep CNN model was able to achieve an accuracy of 0.98 and an F-1 score of 0.98 by correctly identifying the class of 492 of the 500 images. These results show that this DCNN model outperforms existing models such as Sangeetha et al., 2023’s deep CNN model by at least 0.07 in accuracy and is a viable option for identifying Fusarium Wilt in banana crops.more » « less
-
null (Ed.)Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.60% and 0.63% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1.42%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.more » « less
-
As the basis of oceanic food webs and a key component of the biological carbon pump, planktonic organisms play major roles in the oceans. Their study benefited from the development of in situ imaging instruments, which provide higher spatio-temporal resolution than previous tools. But these instruments collect huge quantities of images, the vast majority of which are of marine snow particles or imaging artifacts. Among them, the In Situ Ichthyoplankton Imaging System (ISIIS) samples the largest water volumes (> 100 L s -1 ) and thus produces particularly large datasets. To extract manageable amounts of ecological information from in situ images, we propose to focus on planktonic organisms early in the data processing pipeline: at the segmentation stage. We compared three segmentation methods, particularly for smaller targets, in which plankton represents less than 1% of the objects: (i) a traditional thresholding over the background, (ii) an object detector based on maximally stable extremal regions (MSER), and (iii) a content-aware object detector, based on a Convolutional Neural Network (CNN). These methods were assessed on a subset of ISIIS data collected in the Mediterranean Sea, from which a ground truth dataset of > 3,000 manually delineated organisms is extracted. The naive thresholding method captured 97.3% of those but produced ~340,000 segments, 99.1% of which were therefore not plankton (i.e. recall = 97.3%, precision = 0.9%). Combining thresholding with a CNN missed a few more planktonic organisms (recall = 91.8%) but the number of segments decreased 18-fold (precision increased to 16.3%). The MSER detector produced four times fewer segments than thresholding (precision = 3.5%), missed more organisms (recall = 85.4%), but was considerably faster. Because naive thresholding produces ~525,000 objects from 1 minute of ISIIS deployment, the more advanced segmentation methods significantly improve ISIIS data handling and ease the subsequent taxonomic classification of segmented objects. The cost in terms of recall is limited, particularly for the CNN object detector. These approaches are now standard in computer vision and could be applicable to other plankton imaging devices, the majority of which pose a data management problem.more » « less
An official website of the United States government
