- Award ID(s):
- 1934568
- Publication Date:
- NSF-PAR ID:
- 10349982
- Journal Name:
- 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
- Page Range or eLocation-ID:
- 591 to 600
- Sponsoring Org:
- National Science Foundation
More Like this
-
Automated segmentation of grey matter (GM) and white matter (WM) in gigapixel histopathology images is advantageous to analyzing distributions of disease pathologies, further aiding in neuropathologic deep phenotyping. Although supervised deep learning methods have shown good performance, its requirement of a large amount of labeled data may not be cost-effective for large scale projects. In the case of GM/WM segmentation, trained experts need to carefully trace the delineation in gigapixel images. To minimize manual labeling, we consider semi-surprised learning (SSL) and deploy one state-of-the-art SSL method (FixMatch) on WSIs. Then we propose a two-stage scheme to further improve the performance of SSL: the first stage is a self-supervised module to train an encoder to learn the visual representations of unlabeled data, subsequently, this well-trained encoder will be an initialization of consistency loss-based SSL in the second stage. We test our method on Amyloid-β stained histopathology images and the results outperform FixMatch with the mean IoU score at around 2% by using 6,000 labeled tiles while over 10% by using only 600 labeled tiles from 2 WSIs.Clinical relevance— this work minimizes the required labeling efforts by trained personnel. An improved GM/WM segmentation method could further aid in the study of brainmore »
-
3D object detection is an important yet demanding task that heavily relies on difficult to obtain 3D annotations. To reduce the required amount of supervision, we propose 3DIoUMatch, a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes. We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels. However, due to the high task complexity, we observe that the pseudo-labels suffer from significant noise and are thus not directly usable. To that end, we introduce a confidence-based filtering mechanism, inspired by FixMatch. We set confidence thresholds based upon the predicted objectness and class probability to filter low-quality pseudo-labels. While effective, we observe that these two measures do not sufficiently capture localization quality. We therefore propose to use the estimated 3D IoU as a localization metric and set category-aware self-adjusted thresholds to filter poorly localized proposals. We adopt VoteNet as our backbone detector on indoor datasets while we use PV-RCNN on the autonomous driving dataset, KITTI. Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios (including fully labeled setting). For example, when trainingmore »
-
Abstract Information about the spatial distribution of species lies at the heart of many important questions in ecology. Logistical limitations and collection biases, however, limit the availability of such data at ecologically relevant scales. Remotely sensed information can alleviate some of these concerns, but presents challenges associated with accurate species identification and limited availability of field data for validation, especially in high diversity ecosystems such as tropical forests.
Recent advances in machine learning offer a promising and cost‐efficient approach for gathering a large amount of species distribution data from aerial photographs. Here, we propose a novel machine learning framework, artificial perceptual learning (APL), to tackle the problem of weakly supervised pixel‐level mapping of tree species in forests. Challenges arise from limited availability of ground labels for tree species, lack of precise segmentation of tree canopies and misalignment between visible canopies in the aerial images and stem locations associated with ground labels. The proposed APL framework addresses these challenges by constructing a workflow using state‐of‐the‐art machine learning algorithms.
We develop and illustrate the proposed framework by implementing a fine‐grain mapping of three species, the palm
Prestoea acuminata and the tree speciesCecropia schreberiana andManilkara bidentata , over a 5,000‐ha area of El Yunque National Forest in Puerto Rico.more »Receiver operating characteristic (ROC) curves and Intersection over Union (IoU) metrics demonstrate that APL substantially outperforms the basic workflow and attains human‐level cognitive economy, with 50‐fold time savings. For the palm and
C. schreberiana , the APL framework has high pixelwise accuracy and IoU with reference to human segmentations. ForM .bidentata , APL predictions are congruent with ground‐based labels. Our approach shows great potential for leveraging existing data from global forest plot networks coupled with aerial imagery to map tree species at ecologically meaningful spatial scales. -
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions.In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspiredmore »
-
Semi-supervised learning (SSL) is an appealing approach to resolve generalization problem for speech emotion recognition (SER) systems. By utilizing large amounts of unlabeled data, SSL is able to gain extra information about the prior distribution of the data. Typically, it can lead to better and robust recognition performance. Existing SSL approaches for SER include variations of encoder-decoder model structures such as autoencoder (AE) and variational autoencoders (VAEs), where it is difficult to interpret the learning mechanism behind the latent space. In this study, we introduce a new SSL framework, which we refer to as the DeepEmoCluster framework, for attribute-based SER tasks. The DeepEmoCluster framework is an end-to-end model with mel-spectrogram inputs, which combines a self-supervised pseudo labeling classification network with a supervised emotional attribute regressor. The approach encourages the model to learn latent representations by maximizing the emotional separation of K-means clusters. Our experimental results based on the MSP-Podcast corpus indicate that the DeepEmoCluster framework achieves competitive prediction performances in fully supervised scheme, outperforming baseline methods in most of the conditions. The approach can be further improved by incorporating extra unlabeled set. Moreover, our experimental results explicitly show that the latent clusters have emotional dependencies, enriching the geometric interpretation ofmore »