skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Visual Instance Retrieval from Failure: Efficient Online Local Metric Adaptation from Negative Samples
Existing visual instance retrieval (VIR) approaches attempt to learn a faithful global matching metric or discriminative feature embedding offline to cover enormous visual appearance variations, so as to directly use it online on various unseen probes for retrieval. However, their requirement for a huge set of positive training pairs is very demanding in practice and the performance is largely constrained for the unseen testing samples due to the severe data shifting issue. In contrast, this paper advocates a different paradigm: part of the learning can be performed online but with nominal costs, so as to achieve online metric adaptation for different query probes. By exploiting easily-available negative samples, we propose a novel solution to achieve the optimal local metric adaptation effectively and efficiently. The insight of our method is the local hard negative samples can actually provide tight constraints to fine tune the metric locally. Our local metric adaptation method is generally applicable to be used on top of any offline-learned baselines. In addition, this paper gives in-depth theoretical analyses of the proposed method to guarantee the reduction of the classification error both asymptotically and practically. Extensive experiments on various VIR tasks have confirmed our effectiveness and superiority.  more » « less
Award ID(s):
1815561
PAR ID:
10105063
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE Transactions on Pattern Analysis and Machine Intelligence
ISSN:
0162-8828
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Existing solutions to instance-level visual identification usually aim to learn faithful and discriminative feature extractors from offline training data and directly use them for the unseen online testing data. However, their performance is largely limited due to the severe distribution shifting issue between training and testing samples. Therefore, we propose a novel online group-metric adaptation model to adapt the offline learned identification models for the online data by learning a series of metrics for all sharing-subsets. Each sharing-subset is obtained from the proposed novel frequent sharing-subset mining module and contains a group of testing samples that share strong visual similarity relationships to each other. Furthermore, to handle potentially large-scale testing samples, we introduce self-paced learning (SPL) to gradually include samples into adaptation from easy to difficult which elaborately simulates the learning principle of humans. Unlike existing online visual identification methods, our model simultaneously takes both the sample-specific discriminant and the set-based visual similarity among testing samples into consideration. Our method is generally suitable to any off-the-shelf offline learned visual identification baselines for online performance improvement which can be verified by extensive experiments on several widely-used visual identification benchmarks. 
    more » « less
  2. Person Re-IDentification (P-RID), as an instance-level recognition problem, still remains challenging in computer vision community. Many P-RID works aim to learn faithful and discriminative features/metrics from offline training data and directly use them for the unseen online testing data. However, their performance is largely limited due to the severe data shifting issue between training and testing data. Therefore, we propose an online joint multi-metric adaptation model to adapt the offline learned P-RID models for the online data by learning a series of metrics for all the sharing-subsets. Each sharing-subset is obtained from the proposed novel frequent sharing-subset mining module and contains a group of testing samples which share strong visual similarity relationships to each other. Unlike existing online P-RID methods, our model simultaneously takes both the sample-specific discriminant and the set-based visual similarity among testing samples into consideration so that the adapted multiple metrics can refine the discriminant of all the given testing samples jointly via a multi-kernel late fusion framework. Our proposed model is generally suitable to any offline learned P-RID baselines for online boosting, the performance improvement by our model is not only verified by extensive experiments on several widely-used P-RID benchmarks (CUHK03, Market1501, DukeMTMC-reID and MSMT17) and state-of-the-art P-RID baselines but also guaranteed by the provided in-depth theoretical analyses. 
    more » « less
  3. null (Ed.)
    In this work, we present GazeGraph, a system that leverages human gazes as the sensing modality for cognitive context sensing. GazeGraph is a generalized framework that is compatible with different eye trackers and supports various gaze-based sensing applications. It ensures high sensing performance in the presence of heterogeneity of human visual behavior, and enables quick system adaptation to unseen sensing scenarios with few-shot instances. To achieve these capabilities, we introduce the spatial-temporal gaze graphs and the deep learning-based representation learning method to extract powerful and generalized features from the eye movements for context sensing. Furthermore, we develop a few-shot gaze graph learning module that adapts the `learning to learn' concept from meta-learning to enable quick system adaptation in a data-efficient manner. Our evaluation demonstrates that GazeGraph outperforms the existing solutions in recognition accuracy by 45% on average over three datasets. Moreover, in few-shot learning scenarios, GazeGraph outperforms the transfer learning-based approach by 19% to 30%, while reducing the system adaptation time by 80%. 
    more » « less
  4. We introduce AutoVER, an Autoregressive model for Visual Entity Recognition. Our model extends an autoregressive Multimodal Large Language Model by employing retrieval augmented constrained generation. It mitigates low performance on out-of-domain entities while excelling in queries that require visual reasoning. Our method learns to distinguish similar entities within a vast label space by contrastively training on hard negative pairs in parallel with a sequence-to-sequence objective without an external retriever. During inference, a list of retrieved candidate answers explicitly guides language generation by removing invalid decoding paths. The proposed method achieves significant improvements across different dataset splits in the recently proposed Oven-Wikibenchmark with accuracy on the Entity seen split rising from 32.7% to 61.5%. It demonstrates superior performance on the unseen and query splits by a substantial double-digit margin, while also preserving the ability to effectively transfer to other generic visual question answering benchmarks without further training. 
    more » « less
  5. Probabilistic learning to rank (LTR) has been the dominating approach for optimizing the ranking metric, but cannot maximize long-term rewards. Reinforcement learning models have been proposed to maximize user long-term rewards by formulating the recommendation as a sequential decision-making problem, but could only achieve inferior accuracy compared to LTR counterparts, primarily due to the lack of online interactions and the characteristics of ranking. In this paper, we propose a new off-policy value ranking (VR) algorithm that can simultaneously maximize user long-term rewards and optimize the ranking metric offline for improved sample efficiency in a unified Expectation-Maximization (EM) framework. We theoretically and empirically show that the EM process guides the leaned policy to enjoy the benefit of integration of the future reward and ranking metric, and learn without any online interactions. Extensive offline and online experiments demonstrate the effectiveness of our methods 
    more » « less