skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Semisupervised Multilabel Fully Convolutional Network for Hierarchical Object Parsing
This article presents a semisupervised multilabel fully convolutional network (FCN) for hierarchical object parsing of images. We consider each object part (e.g., eye and head) as a class label and learn to assign every image pixel to multiple coherent part labels. Different from previous methods that consider part labels as independent classes, our method explicitly models the internal relationships between object parts, e.g., that a pixel highly scored for eyes should be highly scored for heads as well. Such relationships directly reflect the structure of the semantic space and thus should be respected while learning the deep representation. We achieve this objective by introducing a multilabel softmax loss function over both labeled and unlabeled images and regularizing it with two pairwise ranking constraints. The first constraint is based on a manifold assumption that image pixels being visually and spatially close to each other should be collaboratively classified as the same part label. The other constraint is used to enforce that no pixel receives significant scores from more than one label that are semantically conflicting with each other. The proposed loss function is differentiable with respect to network parameters and hence can be optimized by standard stochastic gradient methods. We evaluate the proposed method on two public image data sets for hierarchical object parsing and compare it with the alternative parsing methods. Extensive comparisons showed that our method can achieve state-of-the-art performance while using 50% less labeled training samples than the alternatives.  more » « less
Award ID(s):
1657600
PAR ID:
10181126
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Neural Networks and Learning Systems
ISSN:
2162-237X
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning models for hierarchical multilabel classification (HMC) typically achieve low accuracy. This is because such models need not only predict multiple labels for each data instance, but also ensure that predicted labels conform to a given hierarchical structure. Existing state-of the-art strategies for HMC decouple the learning process from ensuring that predicted labels reside in a path of the hierarchy, thus inevitably degrading the overall classification accuracy. To address this limitation, we propose a novel loss function, which enables a model to encode both a global perspective of the class hierarchy, as well local class-relationships in adjacent hierarchical levels, to ensure that predictions align with the class hierarchy, both during training and testing. We demonstrate the superiority of the proposed approach against multiple state-of-the-art methods for HMC on 20 real-world datasets. 
    more » « less
  2. This work investigates how different forms of input elicitation obtained from crowdsourcing can be utilized to improve the quality of inferred labels for image classification tasks, where an image must be labeled as either positive or negative depending on the presence/absence of a specified object. Five types of input elicitation methods are tested: binary classification (positive or negative); the ( x, y )-coordinate of the position participants believe a target object is located; level of confidence in binary response (on a scale from 0 to 100%); what participants believe the majority of the other participants' binary classification is; and participant's perceived difficulty level of the task (on a discrete scale). We design two crowdsourcing studies to test the performance of a variety of input elicitation methods and utilize data from over 300 participants. Various existing voting and machine learning (ML) methods are applied to make the best use of these inputs. In an effort to assess their performance on classification tasks of varying difficulty, a systematic synthetic image generation process is developed. Each generated image combines items from the MPEG-7 Core Experiment CE-Shape-1 Test Set into a single image using multiple parameters (e.g., density, transparency, etc.) and may or may not contain a target object. The difficulty of these images is validated by the performance of an automated image classification method. Experiment results suggest that more accurate results can be achieved with smaller training datasets when both the crowdsourced binary classification labels and the average of the self-reported confidence values in these labels are used as features for the ML classifiers. Moreover, when a relatively larger properly annotated dataset is available, in some cases augmenting these ML algorithms with the results (i.e., probability of outcome) from an automated classifier can achieve even higher performance than what can be obtained by using any one of the individual classifiers. Lastly, supplementary analysis of the collected data demonstrates that other performance metrics of interest, namely reduced false-negative rates, can be prioritized through special modifications of the proposed aggregation methods. 
    more » « less
  3. One significant challenge in the field of supervised deep learning is the lack of large-scale labeled datasets for many problems. In this paper, we propose Consensus Spectral Clustering (CSC), which leverages the strengths of convolutional autoencoders and spectral clustering to provide pseudo labels for image data. This data can be used as weakly-labeled data for training and evaluating classifiers which require supervision. The primary weaknesses of previous works lies in their inability to isolate the object of interest in an image and cluster similar images together. We address these issues by denoising input images to remove pixels which do not contain data pertinent to the target. Additionally, we introduce a voting method for label selection to improve the clustering results. Our extensive experimentation on several benchmark datasets demonstrates that the proposed CSC method achieves competitive performance with state-of-the-art methods. 
    more » « less
  4. Existing approaches for multi-label classification are trained offline, missing the opportunity to adapt to new data instances as they become available. To address this gap, an online multi-label classification method was proposed recently, to learn from data instances sequentially. In this work, we focus on multi-label classification tasks, in which the labels are organized in a hierarchy. We formulate online hierarchical multi-labeled classification as an online optimization task that jointly learns individual label predictors and a label threshold, and propose a novel hierarchy constraint to penalize predictions that are inconsistent with the label hierarchy structure. Experimental results on three benchmark datasets show that the proposed approach outperforms online multi-label classification methods, and achieves comparable to, or even better performance than offline hierarchical classification frameworks with respect to hierarchical evaluation metrics. 
    more » « less
  5. Extracting roads in aerial images has numerous applications in artificial intelligence and multimedia computing, including traffic pattern analysis and parking space planning. Learning deep neural networks, though very successful, demands vast amounts of high-quality annotations, of which acquisition is time-consuming and expensive. In this work, we propose a semi-supervised approach for image-based road extraction where only a small set of labeled images are available for training to address this challenge. We design a pixel-wise contrastive loss to self-supervise the network training to utilize the large corpus of unlabeled images. The key idea is to identify pairs of overlapping image regions (positive) or non-overlapping image regions (negative) and encourage the network to make similar outputs for positive pairs or dissimilar outputs for negative pairs. We also develop a negative sampling strategy to filter false negative samples during the process. An iterative procedure is introduced to apply the network over raw images to generate pseudo-labels, filter and select high-quality labels with the proposed contrastive loss, and re-train the network with the enlarged training dataset. We repeat these iterative steps until convergence. We validate the effectiveness of the proposed methods by performing extensive experiments on the public SpaceNet3 and DeepGlobe Road datasets. Results show that our proposed method achieves state-of-the-art results on public image segmentation benchmarks and significantly outperforms other semi-supervised methods. 
    more » « less