skip to main content


Title: Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning
Zero-shot learning (ZSL) addresses the unseen class recognition problem by leveraging semantic information to transfer knowledge from seen classes to unseen classes. Generative models synthesize the unseen visual features and convert ZSL into a classical supervised learning problem. These generative models are trained using the seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting, which leads to substandard performance on Generalized Zero-Shot learning (GZSL). To address this concern, we propose the novel LsrGAN, a generative model that Leverages the Semantic Relationship between seen and unseen categories and explicitly performs knowledge transfer by incorporating a novel Semantic Regularized Loss (SR-Loss). The SR-loss guides the LsrGAN to generate visual features that mirror the semantic relationships between seen and unseen classes. Experiments on seven benchmark datasets, including the challenging Wikipedia text-based CUB and NABirds splits, and Attribute-based AWA, CUB, and SUN, demonstrates the superiority of the LsrGAN compared to previous state-of-the-art approaches under both ZSL and GZSL. Code is available at https://github.com/Maunil/LsrGAN.  more » « less
Award ID(s):
1828010
NSF-PAR ID:
10277352
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Lecture notes in computer science
Volume:
12375 LNCS
ISSN:
1611-3349
Page Range / eLocation ID:
70-86
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Zero-shot learning (ZSL) for image classification focuses on recognizing novel categories that have no labeled data available for training. The learning is generally carried out with the help of mid-level semantic descriptors associated with each class. This semantic-descriptor space is generally shared by both seen and unseen categories. However, ZSL suffers from hubness, domain discrepancy and biased-ness towards seen classes. To tackle these problems, we propose a three-step approach to zero-shot learning. Firstly, a mapping is learned from the semantic-descriptor space to the image- feature space. This mapping learns to minimize both one-to- one and pairwise distances between semantic embeddings and the image features of the corresponding classes. Secondly, we propose test-time domain adaptation to adapt the semantic embedding of the unseen classes to the test data. This is achieved by finding correspondences between the semantic descriptors and the image features. Thirdly, we propose scaled calibration on the classification scores of the seen classes. This is necessary because the ZSL model is biased towards seen classes as the unseen classes are not used in the training. Finally, to validate the proposed three-step approach, we performed experiments on four benchmark datasets where the proposed method outperformed previous results. We also studied and analyzed the performance of each component of our proposed ZSL framework. 
    more » « less
  2. We introduce the isoperimetric loss as a regularization criterion for learning the map from a visual representation to a semantic embedding, to be used to transfer knowledge to unknown classes in a zero-shot learning setting. We use a pretrained deep neural network model as a visual representation of image data, a Word2Vec embedding of class labels, and linear maps between the visual and semantic embedding spaces. However, the spaces themselves are not linear, and we postulate the sample embedding to be populated by noisy samples near otherwise smooth manifolds. We exploit the graph structure defined by the sample points to regularize the estimates of the manifolds by inferring the graph connectivity using a generalization of the isoperimetric inequalities from Riemannian geometry to graphs. Surprisingly, this regularization alone, paired with the simplest baseline model, outperforms the state-of-the-art among fully automated methods in zeroshot learning benchmarks such as AwA and CUB. This improvement is achieved solely by learning the structure of the underlying spaces by imposing regularity 
    more » « less
  3. Few-shot classification aims to recognize novel categories with only few labeled images in each class. Existing metric-based few-shot classification algorithms predict categories by comparing the feature embeddings of query images with those from a few labeled images (support examples) using a learned metric function. While promising performance has been demonstrated, these methods often fail to generalize to unseen domains due to large discrepancy of the feature distribution across domains. In this work, we address the problem of few-shot classification under domain shifts for metric-based methods. Our core idea is to use feature-wise transformation layers for augmenting the image features using affine transforms to simulate various feature distributions under different domains in the training stage. To capture variations of the feature distributions under different domains, we further apply a learning-to-learn approach to search for the hyper-parameters of the feature-wise transformation layers. We conduct extensive experiments and ablation studies under the domain generalization setting using five few-shot classification datasets: mini-ImageNet, CUB, Cars, Places, and Plantae. Experimental results demonstrate that the proposed feature-wise transformation layer is applicable to various metric-based models, and provides consistent improvements on the few-shot classification performance under domain shift. 
    more » « less
  4. null (Ed.)
    Sketch-based image retrieval (SBIR) is widely recognized as an important vision problem which implies a wide range of real-world applications. Recently, research interests arise in solving this problem under the more realistic and challenging setting of zero-shot learning. In this paper, we investigate this problem from the viewpoint of domain adaptation which we show is critical in improving feature embedding in the zero-shot scenario. Based on a framework which starts with a pre-trained model on ImageNet and finetunes it on the training set of SBIR benchmark, we advocate the importance of preserving previously acquired knowledge, e.g., the rich discriminative features learned from ImageNet, to improve the model’s transfer ability. For this purpose, we design an approach named Semantic-Aware Knowledge prEservation (SAKE), which fine-tunes the pretrained model in an economical way and leverages semantic information, e.g., inter-class relationship, to achieve the goal of knowledge preservation. Zero-shot experiments on two extended SBIR datasets, TU-Berlin and Sketchy, verify the superior performance of our approach. Extensive diagnostic experiments validate that knowledge preserved benefits SBIR in zero-shot settings, as a large fraction of the performance gain is from the more properly structured feature embedding for photo images. 
    more » « less
  5. null (Ed.)
    Deep learning models have achieved state-of-the-art performance in semantic image segmentation, but the results provided by fully automatic algorithms are not always guaranteed satisfactory to users. Interactive segmentation offers a solution by accepting user annotations on selective areas of the images to refine the segmentation results. However, most existing models only focus on correcting the current image’s misclassified pixels, with no knowledge carried over to other images. In this work, we formulate interactive image segmentation as a continual learning problem and propose a framework to effectively learn from user annotations, aiming to improve the segmentation on both the current image and unseen images in future tasks while avoiding deteriorated performance on previously-seen images. It employs a probabilistic mask to control the neural network’s kernel activation and extract the most suitable features for segmenting images in each task. We also design a task-aware architecture to automatically infer the optimal kernel activation for initial segmentation and subsequent refinement. Interactions with users are guided through multi-source uncertainty estimation so that users can focus on the most important areas to minimize the overall manual annotation effort. Extensive experiments are performed on both medical and natural image datasets to illustrate the proposed framework’s effectiveness on basic segmentation performance, forward knowledge transfer, and backward knowledge transfer. 
    more » « less