skip to main content


Title: Self‐supervised learning improves classification of agriculturally important insect pests in plants
Abstract

Insect pests cause significant damage to food production, so early detection and efficient mitigation strategies are crucial. There is a continual shift toward machine learning (ML)‐based approaches for automating agricultural pest detection. Although supervised learning has achieved remarkable progress in this regard, it is impeded by the need for significant expert involvement in labeling the data used for model training. This makes real‐world applications tedious and oftentimes infeasible. Recently, self‐supervised learning (SSL) approaches have provided a viable alternative to training ML models with minimal annotations. Here, we present an SSL approach to classify 22 insect pests. The framework was assessed on raw and segmented field‐captured images using three different SSL methods, Nearest Neighbor Contrastive Learning of Visual Representations (NNCLR), Bootstrap Your Own Latent, and Barlow Twins. SSL pre‐training was done on ResNet‐18 and ResNet‐50 models using all three SSL methods on the original RGB images and foreground segmented images. The performance of SSL pre‐training methods was evaluated using linear probing of SSL representations and end‐to‐end fine‐tuning approaches. The SSL‐pre‐trained convolutional neural network models were able to perform annotation‐efficient classification. NNCLR was the best performing SSL method for both linear and full model fine‐tuning. With just 5% annotated images, transfer learning with ImageNet initialization obtained 74% accuracy, whereas NNCLR achieved an improved classification accuracy of 79% for end‐to‐end fine‐tuning. Models created using SSL pre‐training consistently performed better, especially under very low annotation, and were robust to object class imbalances. These approaches help overcome annotation bottlenecks and are resource efficient.

 
more » « less
Award ID(s):
1954556
NSF-PAR ID:
10441916
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
The Plant Phenome Journal
Volume:
6
Issue:
1
ISSN:
2578-2703
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Introduction

    Computer vision and deep learning (DL) techniques have succeeded in a wide range of diverse fields. Recently, these techniques have been successfully deployed in plant science applications to address food security, productivity, and environmental sustainability problems for a growing global population. However, training these DL models often necessitates the large-scale manual annotation of data which frequently becomes a tedious and time-and-resource- intensive process. Recent advances in self-supervised learning (SSL) methods have proven instrumental in overcoming these obstacles, using purely unlabeled datasets to pre-train DL models.

    Methods

    Here, we implement the popular self-supervised contrastive learning methods of NNCLR Nearest neighbor Contrastive Learning of visual Representations) and SimCLR (Simple framework for Contrastive Learning of visual Representations) for the classification of spatial orientation and segmentation of embryos of maize kernels. Maize kernels are imaged using a commercial high-throughput imaging system. This image data is often used in multiple downstream applications across both production and breeding applications, for instance, sorting for oil content based on segmenting and quantifying the scutellum’s size and for classifying haploid and diploid kernels.

    Results and discussion

    We show that in both classification and segmentation problems, SSL techniques outperform their purely supervised transfer learning-based counterparts and are significantly more annotation efficient. Additionally, we show that a single SSL pre-trained model can be efficiently finetuned for both classification and segmentation, indicating good transferability across multiple downstream applications. Segmentation models with SSL-pretrained backbones produce DICE similarity coefficients of 0.81, higher than the 0.78 and 0.73 of those with ImageNet-pretrained and randomly initialized backbones, respectively. We observe that finetuning classification and segmentation models on as little as 1% annotation produces competitive results. These results show SSL provides a meaningful step forward in data efficiency with agricultural deep learning and computer vision.

     
    more » « less
  2. null (Ed.)
    Recent years have witnessed the enormous success of text representation learning in a wide range of text mining tasks. Earlier word embedding learning approaches represent words as fixed low-dimensional vectors to capture their semantics. The word embeddings so learned are used as the input features of task-specific models. Recently, pre-trained language models (PLMs), which learn universal language representations via pre-training Transformer-based neural models on large-scale text corpora, have revolutionized the natural language processing (NLP) field. Such pre-trained representations encode generic linguistic features that can be transferred to almost any text-related applications. PLMs outperform previous task-specific models in many applications as they only need to be fine-tuned on the target corpus instead of being trained from scratch. In this tutorial, we introduce recent advances in pre-trained text embeddings and language models, as well as their applications to a wide range of text mining tasks. Specifically, we first overview a set of recently developed self-supervised and weakly-supervised text embedding methods and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several new methods based on pre-trained text embeddings and language models for various text mining applications such as topic discovery and text classification. We focus on methods that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge from large-scale text corpora. Finally, we demonstrate with real world datasets how pre-trained text representations help mitigate the human annotation burden and facilitate automatic, accurate and efficient text analyses. 
    more » « less
  3. Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show that fine-tuning with pseudo labels achieves a 5.35% phoneme error rate reduction and 2.48% MDD F1 score improvement over a labeled-samples-only finetuning baseline. The proposed PL method is also shown to outperform conventional offline PL methods. Compared to the state-of-the-art MDD systems, our MDD solution produces a more accurate and consistent phonetic error diagnosis. In addition, we conduct an open test on a separate UTD-4Accents dataset, where our system recognition outputs show a strong correlation with human perception, based on accentedness and intelligibility. 
    more » « less
  4. Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions.In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches. 
    more » « less
  5. Deep learning algorithms have been moderately successful in diagnoses of diseases by analyzing medical images especially through neuroimaging that is rich in annotated data. Transfer learning methods have demonstrated strong performance in tackling annotated data. It utilizes and transfers knowledge learned from a source domain to target domain even when the dataset is small. There are multiple approaches to transfer learning that result in a range of performance estimates in diagnosis, detection, and classification of clinical problems. Therefore, in this paper, we reviewed transfer learning approaches, their design attributes, and their applications to neuroimaging problems. We reviewed two main literature databases and included the most relevant studies using predefined inclusion criteria. Among 50 reviewed studies, more than half of them are on transfer learning for Alzheimer's disease. Brain mapping and brain tumor detection were second and third most discussed research problems, respectively. The most common source dataset for transfer learning was ImageNet, which is not a neuroimaging dataset. This suggests that the majority of studies preferred pre-trained models instead of training their own model on a neuroimaging dataset. Although, about one third of studies designed their own architecture, most studies used existing Convolutional Neural Network architectures. Magnetic Resonance Imaging was the most common imaging modality. In almost all studies, transfer learning contributed to better performance in diagnosis, classification, segmentation of different neuroimaging diseases and problems, than methods without transfer learning. Among different transfer learning approaches, fine-tuning all convolutional and fully-connected layers approach and freezing convolutional layers and fine-tuning fully-connected layers approach demonstrated superior performance in terms of accuracy. These recent transfer learning approaches not only show great performance but also require less computational resources and time. 
    more » « less