skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 13 until 2:00 AM ET on Friday, June 14 due to maintenance. We apologize for the inconvenience.


Title: Learning Visual Instance Retrieval from Failure: Efficient Online Local Metric Adaptation from Negative Samples
Existing visual instance retrieval (VIR) approaches attempt to learn a faithful global matching metric or discriminative feature embedding offline to cover enormous visual appearance variations, so as to directly use it online on various unseen probes for retrieval. However, their requirement for a huge set of positive training pairs is very demanding in practice and the performance is largely constrained for the unseen testing samples due to the severe data shifting issue. In contrast, this paper advocates a different paradigm: part of the learning can be performed online but with nominal costs, so as to achieve online metric adaptation for different query probes. By exploiting easily-available negative samples, we propose a novel solution to achieve the optimal local metric adaptation effectively and efficiently. The insight of our method is the local hard negative samples can actually provide tight constraints to fine tune the metric locally. Our local metric adaptation method is generally applicable to be used on top of any offline-learned baselines. In addition, this paper gives in-depth theoretical analyses of the proposed method to guarantee the reduction of the classification error both asymptotically and practically. Extensive experiments on various VIR tasks have confirmed our effectiveness and superiority.  more » « less
Award ID(s):
1815561
NSF-PAR ID:
10105063
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE Transactions on Pattern Analysis and Machine Intelligence
ISSN:
0162-8828
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Person Re-IDentification (P-RID), as an instance-level recognition problem, still remains challenging in computer vision community. Many P-RID works aim to learn faithful and discriminative features/metrics from offline training data and directly use them for the unseen online testing data. However, their performance is largely limited due to the severe data shifting issue between training and testing data. Therefore, we propose an online joint multi-metric adaptation model to adapt the offline learned P-RID models for the online data by learning a series of metrics for all the sharing-subsets. Each sharing-subset is obtained from the proposed novel frequent sharing-subset mining module and contains a group of testing samples which share strong visual similarity relationships to each other. Unlike existing online P-RID methods, our model simultaneously takes both the sample-specific discriminant and the set-based visual similarity among testing samples into consideration so that the adapted multiple metrics can refine the discriminant of all the given testing samples jointly via a multi-kernel late fusion framework. Our proposed model is generally suitable to any offline learned P-RID baselines for online boosting, the performance improvement by our model is not only verified by extensive experiments on several widely-used P-RID benchmarks (CUHK03, Market1501, DukeMTMC-reID and MSMT17) and state-of-the-art P-RID baselines but also guaranteed by the provided in-depth theoretical analyses. 
    more » « less
  2. Existing solutions to instance-level visual identification usually aim to learn faithful and discriminative feature extractors from offline training data and directly use them for the unseen online testing data. However, their performance is largely limited due to the severe distribution shifting issue between training and testing samples. Therefore, we propose a novel online group-metric adaptation model to adapt the offline learned identification models for the online data by learning a series of metrics for all sharing-subsets. Each sharing-subset is obtained from the proposed novel frequent sharing-subset mining module and contains a group of testing samples that share strong visual similarity relationships to each other. Furthermore, to handle potentially large-scale testing samples, we introduce self-paced learning (SPL) to gradually include samples into adaptation from easy to difficult which elaborately simulates the learning principle of humans. Unlike existing online visual identification methods, our model simultaneously takes both the sample-specific discriminant and the set-based visual similarity among testing samples into consideration. Our method is generally suitable to any off-the-shelf offline learned visual identification baselines for online performance improvement which can be verified by extensive experiments on several widely-used visual identification benchmarks. 
    more » « less
  3. Existing rate adaptation protocols have advocated training to establish the relationship between channel conditions and the optimum modulation and coding scheme. However, innate with in-field operation is encountering scenarios that the rate adaptation mechanism has not yet encountered. Frequently, protocols are optimally tuned for indoor environments but, when taken outdoors, perform poorly. Namely, the decision structure formed by offline training, lacks the ability to adapt to a new situation on the fly. The changing wireless environment calls for a rate adaption scheme that can quickly infer the channel type and adjust accordingly. Typical SNR-based rate adaptation scheme do not capture the nuance of the performance variable in different channel types. In this paper, we propose a novel scheme that allow SNR-based rate selection algorithms to be trained online in the environment in which they are operating. Inspired by the idea that, to do well, an athlete must train for the type of athletic event and environment in which they are competing, we propose FIT, an on-the-fly, in-situ training mechanism for SNRbased protocols. To do so, we first propose the FIT framework which addresses the challenges of making rate decisions with unpredictable fluctuation and lack of repeatability of real wireless channels. To distinguish between channel types in the training, we then characterize wireless channels according to the link-layer performance and introduce a novel, computationally-efficient, channel performance manifold matching technique to infer the channel type given a sequence of throughput measurements for various link-level parameters. To evaluate our methods, we implement rate selection which uses FIT for training alongside channel performance manifold matching. We then perform extensive experiments on emulated and in-field wireless channels to evaluate the online learning process, showing that the rate decision structure can be updated as channel conditions change using existing traffic flows. The experiments are performed over multiple frequency bands. The proposed FIT framework can achieve large throughput gains compared to traditional SNRbased protocols (8X) and offline-training-based methods (1.3X), particularly in a dynamic wireless propagation environments that lack appropriate training. 
    more » « less
  4. null (Ed.)
    In this work, we present GazeGraph, a system that leverages human gazes as the sensing modality for cognitive context sensing. GazeGraph is a generalized framework that is compatible with different eye trackers and supports various gaze-based sensing applications. It ensures high sensing performance in the presence of heterogeneity of human visual behavior, and enables quick system adaptation to unseen sensing scenarios with few-shot instances. To achieve these capabilities, we introduce the spatial-temporal gaze graphs and the deep learning-based representation learning method to extract powerful and generalized features from the eye movements for context sensing. Furthermore, we develop a few-shot gaze graph learning module that adapts the `learning to learn' concept from meta-learning to enable quick system adaptation in a data-efficient manner. Our evaluation demonstrates that GazeGraph outperforms the existing solutions in recognition accuracy by 45% on average over three datasets. Moreover, in few-shot learning scenarios, GazeGraph outperforms the transfer learning-based approach by 19% to 30%, while reducing the system adaptation time by 80%. 
    more » « less
  5. Path guiding is a promising technique to reduce the variance of path tracing. Although existing online path guiding algorithms can eventually learn good sampling distributions given a large amount of time and samples, the speed of learning becomes a major bottleneck. In this paper, we accelerate the learning of sampling distributions by training a light-weight neural network offline to reconstruct from sparse samples. Uniquely, we design our neural network to directly operate convolutions on a sparse quadtree, which regresses a high-quality hierarchical sampling distribution. Our approach can reconstruct reasonably accurate sampling distributions faster, allowing for efficient path guiding and rendering. In contrast to the recent offline neural path guiding techniques that reconstruct low-resolution 2D images for sampling, our novel hierarchical framework enables more fine-grained directional sampling with less memory usage, effectively advancing the practicality and efficiency of neural path guiding. In addition, we take advantage of hybrid bidirectional samples including both path samples and photons, as we have found this more robust to different light transport scenarios compared to using only one type of sample as in previous work. Experiments on diverse testing scenes demonstrate that our approach often improves rendering results with better visual quality and lower errors. Our framework can also provide the proper balance of speed, memory cost, and robustness. 
    more » « less