skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sample-to-sample correspondence for unsupervised domain adaptation
The assumption that training and testing samples are generated from the same distribution does not always hold for real-world machine-learning applications. The procedure of tackling this discrepancy between the training (source) and testing (target) domains is known as domain adaptation. We propose an unsupervised version of domain adaptation that considers the presence of only unlabelled data in the target domain. Our approach centres on finding correspondences between samples of each domain. The correspondences are obtained by treating the source and target samples as graphs and using a convex criterion to match them. The criteria used are first-order and second-order similarities between the graphs as well as a class-based regularization. We have also developed a computationally efficient routine for the convex optimization, thus allowing the proposed method to be used widely. To verify the effectiveness of the proposed method, computer simulations were conducted on synthetic, image classification and sentiment classification datasets. Results validated that the proposed local sample-to- sample matching method out-performs traditional moment-matching methods and is competitive with respect to current local domain-adaptation methods.  more » « less
Award ID(s):
1813935
PAR ID:
10094741
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Engineering applications of artificial intelligence
Volume:
73
ISSN:
0952-1976
Page Range / eLocation ID:
80-91
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Domain adaptation (DA) addresses the real-world image classification problem of discrepancy between training (source) and testing (target) data distributions. We propose an unsupervised DA method that considers the presence of only unlabelled data in the target do- main. Our approach centers on finding matches between samples of the source and target domains. The matches are obtained by treating the source and target domains as hyper-graphs and carrying out a class-regularized hyper-graph matching using first-, second- and third-order similarities between the graphs. We have also developed a computationally efficient algorithm by initially selecting a subset of the samples to construct a graph and then developing a customized optimization routine for graph-matching based on Conditional Gradient and Alternating Direction Multiplier Method. This allows the proposed method to be used widely. We also performed a set of experiments on standard object recognition datasets to validate the effectiveness of our framework over previous approaches. 
    more » « less
  2. Domain adaptation has become an attractive learning paradigm, as it can leverage source domains with rich labels to deal with classification tasks in an unlabeled target domain. A few recent studies develop domain adaptation approaches for graph-structured data. In the case of node classification task, current domain adaptation methods only focus on the closed-set setting, where source and target domains share the same label space. A more practical assumption is that the target domain may contain new classes that are not included in the source domain. Therefore, in this paper, we introduce a novel and challenging problem for graphs, i.e., open-set domain adaptive node classification, and propose a new approach to solve it. Specifically, we develop an algorithm for efficient knowledge transfer from a labeled source graph to an unlabeled target graph under a separate domain alignment (SDA) strategy, in order to learn discriminative feature representations for the target graph. Our goal is to not only correctly classify target nodes into the known classes, but also classify unseen types of nodes into an unknown class. Experimental results on real-world datasets show that our method outperforms existing methods on graph domain adaptation. 
    more » « less
  3. The goal of domain adaptation is to train a high-performance predictive model on the target domain data by using knowledge from the source domain data, which has different but related data distribution. In this paper, we consider unsupervised domain adaptation where we have labelled source domain data but unlabelled target domain data. Our solution to unsupervised domain adaptation is to learn a domain- invariant representation that is also category discriminative. Domain- invariant representations are realized by minimizing the domain discrepancy. To minimize the domain discrepancy, we propose a novel graph- matching metric between the source and target domain representations. Minimizing this metric allows the source and target representations to be in support of each other. We further exploit confident unlabelled target domain samples and their pseudo-labels to refine our proposed model. We expect the refining step to improve the performance further. This is validated by performing experiments on standard image classification adaptation datasets. Results showed our proposed approach out-perform previous domain-invariant representation learning approaches. 
    more » « less
  4. In this paper we propose a data-driven fault detection framework for semi-supervised scenarios where labeled training data from the system under consideration (the β€œtarget”) is imbalanced (e.g. only relatively few labels are available from one of the classes), but data from a related system (the β€œsource”) is readily available. An example of this situation is when a generic simulator is available, but needs to be tuned on a case-by-case basis to match the parameters of the actual system. The goal of this paper is to work with the statistical distribution of the data without necessitating system identification. Our main result shows that if the source and target domain are related by a linear transformation (a common assumption in domain adaptation), the problem of designing a classifier that minimizes a miss-classification loss over the joint source and target domains reduces to a convex optimization subject to a single (non-convex) equality constraint. This second-order equality constraint can be recast as a rank-1 optimization problem, where the rank constraint can be efficiently handled through a reweighted nuclear norm surrogate. These results are illustrated with a practical application: fault detection in additive manufacturing (industrial 3D printing). The proposed method is able to exploit simulation data (source domain) to substantially outperform classifiers tuned using only data from a single domain. 
    more » « less
  5. null (Ed.)
    Domain adaptation aims to correct the classifiers when faced with distribution shift between source (training) and target (test) domains. State-of-the-art domain adaptation methods make use of deep networks to extract domain-invariant representations. However, existing methods assume that all the instances in the source domain are correctly labeled; while in reality, it is unsurprising that we may obtain a source domain with noisy labels. In this paper, we are the first to comprehensively investigate how label noise could adversely affect existing domain adaptation methods in various scenarios. Further, we theoretically prove that there exists a method that can essentially reduce the side-effect of noisy source labels in domain adaptation. Specifically, focusing on the generalized target shift scenario, where both label distribution π‘ƒπ‘Œ and the class-conditional distribution 𝑃𝑋|π‘Œ can change, we discover that the denoising Conditional Invariant Component (DCIC) framework can provably ensures (1) extracting invariant representations given examples with noisy labels in the source domain and unlabeled examples in the target domain and (2) estimating the label distribution in the target domain with no bias. Experimental results on both synthetic and real-world data verify the effectiveness of the proposed method. 
    more » « less