skip to main content

Title: Self‐supervised learning improves classification of agriculturally important insect pests in plants

Insect pests cause significant damage to food production, so early detection and efficient mitigation strategies are crucial. There is a continual shift toward machine learning (ML)‐based approaches for automating agricultural pest detection. Although supervised learning has achieved remarkable progress in this regard, it is impeded by the need for significant expert involvement in labeling the data used for model training. This makes real‐world applications tedious and oftentimes infeasible. Recently, self‐supervised learning (SSL) approaches have provided a viable alternative to training ML models with minimal annotations. Here, we present an SSL approach to classify 22 insect pests. The framework was assessed on raw and segmented field‐captured images using three different SSL methods, Nearest Neighbor Contrastive Learning of Visual Representations (NNCLR), Bootstrap Your Own Latent, and Barlow Twins. SSL pre‐training was done on ResNet‐18 and ResNet‐50 models using all three SSL methods on the original RGB images and foreground segmented images. The performance of SSL pre‐training methods was evaluated using linear probing of SSL representations and end‐to‐end fine‐tuning approaches. The SSL‐pre‐trained convolutional neural network models were able to perform annotation‐efficient classification. NNCLR was the best performing SSL method for both linear and full model fine‐tuning. With just 5% annotated images, transfer learning with ImageNet initialization obtained 74% accuracy, whereas NNCLR achieved an improved classification accuracy of 79% for end‐to‐end fine‐tuning. Models created using SSL pre‐training consistently performed better, especially under very low annotation, and were robust to object class imbalances. These approaches help overcome annotation bottlenecks and are resource efficient.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
The Plant Phenome Journal
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recent years have witnessed the enormous success of text representation learning in a wide range of text mining tasks. Earlier word embedding learning approaches represent words as fixed low-dimensional vectors to capture their semantics. The word embeddings so learned are used as the input features of task-specific models. Recently, pre-trained language models (PLMs), which learn universal language representations via pre-training Transformer-based neural models on large-scale text corpora, have revolutionized the natural language processing (NLP) field. Such pre-trained representations encode generic linguistic features that can be transferred to almost any text-related applications. PLMs outperform previous task-specific models in many applications as they only need to be fine-tuned on the target corpus instead of being trained from scratch. In this tutorial, we introduce recent advances in pre-trained text embeddings and language models, as well as their applications to a wide range of text mining tasks. Specifically, we first overview a set of recently developed self-supervised and weakly-supervised text embedding methods and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several new methods based on pre-trained text embeddings and language models for various text mining applications such as topic discovery and text classification. We focus on methods that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge from large-scale text corpora. Finally, we demonstrate with real world datasets how pre-trained text representations help mitigate the human annotation burden and facilitate automatic, accurate and efficient text analyses. 
    more » « less
  2. Deep learning algorithms have been moderately successful in diagnoses of diseases by analyzing medical images especially through neuroimaging that is rich in annotated data. Transfer learning methods have demonstrated strong performance in tackling annotated data. It utilizes and transfers knowledge learned from a source domain to target domain even when the dataset is small. There are multiple approaches to transfer learning that result in a range of performance estimates in diagnosis, detection, and classification of clinical problems. Therefore, in this paper, we reviewed transfer learning approaches, their design attributes, and their applications to neuroimaging problems. We reviewed two main literature databases and included the most relevant studies using predefined inclusion criteria. Among 50 reviewed studies, more than half of them are on transfer learning for Alzheimer's disease. Brain mapping and brain tumor detection were second and third most discussed research problems, respectively. The most common source dataset for transfer learning was ImageNet, which is not a neuroimaging dataset. This suggests that the majority of studies preferred pre-trained models instead of training their own model on a neuroimaging dataset. Although, about one third of studies designed their own architecture, most studies used existing Convolutional Neural Network architectures. Magnetic Resonance Imaging was the most common imaging modality. In almost all studies, transfer learning contributed to better performance in diagnosis, classification, segmentation of different neuroimaging diseases and problems, than methods without transfer learning. Among different transfer learning approaches, fine-tuning all convolutional and fully-connected layers approach and freezing convolutional layers and fine-tuning fully-connected layers approach demonstrated superior performance in terms of accuracy. These recent transfer learning approaches not only show great performance but also require less computational resources and time. 
    more » « less
  3. Abstract

    Pollen identification is necessary for several subfields of geology, ecology, and evolutionary biology. However, the existing methods for pollen identification are laborious, time-consuming, and require highly skilled scientists. Therefore, there is a pressing need for an automated and accurate system for pollen identification, which can be beneficial for both basic research and applied issues such as identifying airborne allergens. In this study, we propose a deep learning (DL) approach to classify pollen grains in the Great Basin Desert, Nevada, USA. Our dataset consisted of 10,000 images of 40 pollen species. To mitigate the limitations imposed by the small volume of our training dataset, we conducted an in-depth comparative analysis of numerous pre-trained Convolutional Neural Network (CNN) architectures utilizing transfer learning methodologies. Simultaneously, we developed and incorporated an innovative CNN model, serving to augment our exploration and optimization of data modeling strategies. We applied different architectures of well-known pre-trained deep CNN models, including AlexNet, VGG-16, MobileNet-V2, ResNet (18, 34, and 50, 101), ResNeSt (50, 101), SE-ResNeXt, and Vision Transformer (ViT), to uncover the most promising modeling approach for the classification of pollen grains in the Great Basin. To evaluate the performance of the pre-trained deep CNN models, we measured accuracy, precision, F1-Score, and recall. Our results showed that the ResNeSt-110 model achieved the best performance, with an accuracy of 97.24%, precision of 97.89%, F1-Score of 96.86%, and recall of 97.13%. Our results also revealed that transfer learning models can deliver better and faster image classification results compared to traditional CNN models built from scratch. The proposed method can potentially benefit various fields that rely on efficient pollen identification. This study demonstrates that DL approaches can improve the accuracy and efficiency of pollen identification, and it provides a foundation for further research in the field.

    more » « less
  4. Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show that fine-tuning with pseudo labels achieves a 5.35% phoneme error rate reduction and 2.48% MDD F1 score improvement over a labeled-samples-only finetuning baseline. The proposed PL method is also shown to outperform conventional offline PL methods. Compared to the state-of-the-art MDD systems, our MDD solution produces a more accurate and consistent phonetic error diagnosis. In addition, we conduct an open test on a separate UTD-4Accents dataset, where our system recognition outputs show a strong correlation with human perception, based on accentedness and intelligibility. 
    more » « less
  5. Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions.In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches. 
    more » « less