skip to main content


Title: Joint Semi-supervised and Active Learning for Segmentation of Gigapixel Pathology Images with Cost-Effective Labeling
The need for manual and detailed annotations limits the applicability of supervised deep learning algorithms in medical image analyses, specifically in the field of pathology. Semi-supervised learning (SSL) provides an effective way for leveraging unlabeled data to relieve the heavy reliance on the amount of labeled samples when training a model. Although SSL has shown good performance, the performance of recent state-of-the-art SSL methods on pathology images is still under study. The problem for selecting the most optimal data to label for SSL is not fully explored. To tackle this challenge, we propose a semi-supervised active learning framework with a region-based selection criterion. This framework iteratively selects regions for an-notation query to quickly expand the diversity and volume of the labeled set. We evaluate our framework on a grey-matter/white-matter segmentation problem using gigapixel pathology images from autopsied human brain tissues. With only 0.1% regions labeled, our proposed algorithm can reach a competitive IoU score compared to fully-supervised learning and outperform the current state-of-the-art SSL by more than 10% of IoU score and DICE coefficient.  more » « less
Award ID(s):
1934568
NSF-PAR ID:
10349982
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Page Range / eLocation ID:
591 to 600
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Automated segmentation of grey matter (GM) and white matter (WM) in gigapixel histopathology images is advantageous to analyzing distributions of disease pathologies, further aiding in neuropathologic deep phenotyping. Although supervised deep learning methods have shown good performance, its requirement of a large amount of labeled data may not be cost-effective for large scale projects. In the case of GM/WM segmentation, trained experts need to carefully trace the delineation in gigapixel images. To minimize manual labeling, we consider semi-surprised learning (SSL) and deploy one state-of-the-art SSL method (FixMatch) on WSIs. Then we propose a two-stage scheme to further improve the performance of SSL: the first stage is a self-supervised module to train an encoder to learn the visual representations of unlabeled data, subsequently, this well-trained encoder will be an initialization of consistency loss-based SSL in the second stage. We test our method on Amyloid-β stained histopathology images and the results outperform FixMatch with the mean IoU score at around 2% by using 6,000 labeled tiles while over 10% by using only 600 labeled tiles from 2 WSIs.Clinical relevance— this work minimizes the required labeling efforts by trained personnel. An improved GM/WM segmentation method could further aid in the study of brain diseases, such as Alzheimer’s disease. 
    more » « less
  2. null (Ed.)
    3D object detection is an important yet demanding task that heavily relies on difficult to obtain 3D annotations. To reduce the required amount of supervision, we propose 3DIoUMatch, a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes. We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels. However, due to the high task complexity, we observe that the pseudo-labels suffer from significant noise and are thus not directly usable. To that end, we introduce a confidence-based filtering mechanism, inspired by FixMatch. We set confidence thresholds based upon the predicted objectness and class probability to filter low-quality pseudo-labels. While effective, we observe that these two measures do not sufficiently capture localization quality. We therefore propose to use the estimated 3D IoU as a localization metric and set category-aware self-adjusted thresholds to filter poorly localized proposals. We adopt VoteNet as our backbone detector on indoor datasets while we use PV-RCNN on the autonomous driving dataset, KITTI. Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios (including fully labeled setting). For example, when training using only 10% labeled data on ScanNet, 3DIoUMatch achieves 7.7 absolute improvement on mAP@0.25 and 8.5 absolute improvement on mAP@0.5 upon the prior art. On KITTI, we are the first to demonstrate semi-supervised 3D object detection and our method surpasses a fully supervised baseline from 1.8% to 7.6% under different label ratio and categories. 
    more » « less
  3. Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions.In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches. 
    more » « less
  4. Abstract

    Information about the spatial distribution of species lies at the heart of many important questions in ecology. Logistical limitations and collection biases, however, limit the availability of such data at ecologically relevant scales. Remotely sensed information can alleviate some of these concerns, but presents challenges associated with accurate species identification and limited availability of field data for validation, especially in high diversity ecosystems such as tropical forests.

    Recent advances in machine learning offer a promising and cost‐efficient approach for gathering a large amount of species distribution data from aerial photographs. Here, we propose a novel machine learning framework, artificial perceptual learning (APL), to tackle the problem of weakly supervised pixel‐level mapping of tree species in forests. Challenges arise from limited availability of ground labels for tree species, lack of precise segmentation of tree canopies and misalignment between visible canopies in the aerial images and stem locations associated with ground labels. The proposed APL framework addresses these challenges by constructing a workflow using state‐of‐the‐art machine learning algorithms.

    We develop and illustrate the proposed framework by implementing a fine‐grain mapping of three species, the palmPrestoea acuminataand the tree speciesCecropia schreberianaandManilkara bidentata, over a 5,000‐ha area of El Yunque National Forest in Puerto Rico. These large‐scale maps are based on unlabelled high‐resolution aerial images of unsegmented tree canopies. Misaligned ground‐based labels, available for <1% of these images, serve as the only weak supervision. APL performance is evaluated using ground‐based labels and high‐quality human segmentation using Amazon Mechanical Turk, and compared to a basic workflow that relies solely on labelled images.

    Receiver operating characteristic (ROC) curves and Intersection over Union (IoU) metrics demonstrate that APL substantially outperforms the basic workflow and attains human‐level cognitive economy, with 50‐fold time savings. For the palm andC. schreberiana, the APL framework has high pixelwise accuracy and IoU with reference to human segmentations. ForM.bidentata, APL predictions are congruent with ground‐based labels. Our approach shows great potential for leveraging existing data from global forest plot networks coupled with aerial imagery to map tree species at ecologically meaningful spatial scales.

     
    more » « less
  5. Archaeology has long faced fundamental issues of sampling and scalar representation. Traditionally, the local-to-regional-scale views of settlement patterns are produced through systematic pedestrian surveys. Recently, systematic manual survey of satellite and aerial imagery has enabled continuous distributional views of archaeological phenomena at interregional scales. However, such ‘brute force’ manual imagery survey methods are both time- and labour-intensive, as well as prone to inter-observer differences in sensitivity and specificity. The development of self-supervised learning methods (e.g. contrastive learning) offers a scalable learning scheme for locating archaeological features using unlabelled satellite and historical aerial images. However, archaeological features are generally only visible in a very small proportion relative to the landscape, while the modern contrastive-supervised learning approach typically yields an inferior performance on highly imbalanced datasets. In this work, we propose a framework to address this long-tail problem. As opposed to the existing contrastive learning approaches that typically treat the labelled and unlabelled data separately, our proposed method reforms the learning paradigm under a semi-supervised setting in order to fully utilize the precious annotated data (<7% in our setting). Specifically, the highly unbalanced nature of the data is employed as the prior knowledge in order to form pseudo negative pairs by ranking the similarities between unannotated image patches and annotated anchor images. In this study, we used 95,358 unlabelled images and 5,830 labelled images in order to solve the issues associated with detecting ancient buildings from a long-tailed satellite image dataset. From the results, our semi-supervised contrastive learning model achieved a promising testing balanced accuracy of 79.0%, which is a 3.8% improvement as compared to other state-of-the-art approaches. 
    more » « less