Machine learning algorithms typically assume that the training and test samples come from the same distributions, i.e., in-distribution. However, in open-world scenarios, streaming big data can be Out-Of-Distribution (OOD), rendering these algorithms ineffective. Prior solutions to the OOD challenge seek to identify invariant features across different training domains. The underlying assumption is that these invariant features should also work reasonably well in the unlabeled target domain. By contrast, this work is interested in the domain-specific features that include both invariant features and features unique to the target domain. We propose a simple yet effective approach that relies on correlations in general regardless of whether the features are invariant or not. Our approach uses the most confidently predicted samples identified by an OOD base model (teacher model) to train a new model (student model) that effectively adapts to the target domain. Empirical evaluations on benchmark datasets show that the performance is improved over the SOTA by ∼10-20%.
more »
« less
This content will become publicly available on May 25, 2026
Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms
Domainadaptation(DA)isastatisticallearningproblemthatariseswhenthedistribution ofthesourcedatausedtotrainamodeldi↵ersfromthatofthetargetdatausedtoevaluate themodel. WhilemanyDAalgorithmshavedemonstratedconsiderableempiricalsuccess, blindly applying these algorithms can often lead to worse performance on new datasets. Toaddressthis, itiscrucialtoclarifytheassumptionsunderwhichaDAalgorithmhas good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, whichcanbeestimatedthroughconditionalinvariantpenalty(CIP),playthreeprominent rolesinprovidingtargetriskguaranteesinDA.First,weproposeanewalgorithmbased on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target riskguaranteesbeyondsimplesettingssuchascovariateshiftandlabelshift. Second,we showthatCICshelpidentifylargediscrepanciesbetweensourceandtargetrisksofother DAalgorithms. Finally,wedemonstratethatincorporatingCICsintothedomaininvariant projection(DIP)algorithmcanaddressitsfailurescenariocausedbylabel-flippingfeatures. We support our new algorithms and theoretical findings via numerical experiments on syntheticdata,MNIST,CelebA,Camelyon17,andDomainNetdatasets.
more »
« less
- PAR ID:
- 10635744
- Publisher / Repository:
- MIT Press
- Date Published:
- Journal Name:
- Journal of Machine Learning Research
- ISSN:
- 1533-7928
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Adversarial learning has demonstrated good performance in the unsupervised domain adaptation setting, by learning domain-invariant representations. However, recent work has shown limitations of this approach when label distributions differ between the source and target domains. In this paper, we propose a new assumption, generalized label shift (GLS), to improve robustness against mismatched label distributions. GLS states that, conditioned on the label, there exists a representation of the input that is invariant between the source and target domains. Under GLS, we provide theoretical guarantees on the transfer performance of any classifier. We also devise necessary and sufficient conditions for GLS to hold, by using an estimation of the relative class weights between domains and an appropriate reweighting of samples. Our weight estimation method could be straightforwardly and generically applied in existing domain adaptation (DA) algorithms that learn domain-invariant representations, with small computational overhead. In particular, we modify three DA algorithms, JAN, DANN and CDAN, and evaluate their performance on standard and artificial DA tasks. Our algorithms outperform the base versions, with vast improvements for large label distribution mismatches. Our code is available at https://tinyurl.com/y585xt6j.more » « less
-
We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. Multivalid coverage guarantees are stronger than marginal coverage guarantees in two ways: (1) They hold even conditional on group membership---that is, the target coverage level holds conditionally on membership in each of an arbitrary (potentially intersecting) group in a finite collection of regions in the feature space. (2) They hold even conditional on the value of the threshold used to produce the prediction set on a given example. In fact multivalid coverage guarantees hold even when conditioning on group membership and threshold value simultaneously. We give two algorithms: both take as input an arbitrary non-conformity score and an arbitrary collection of possibly intersecting groups , and then can equip arbitrary black-box predictors with prediction sets. Our first algorithm is a direct extension of quantile regression, needs to solve only a single convex minimization problem, and produces an estimator which has group-conditional guarantees for each group in . Our second algorithm is iterative, and gives the full guarantees of multivalid conformal prediction: prediction sets that are valid conditionally both on group membership and non-conformity threshold. We evaluate the performance of both of our algorithms in an extensive set of experiments.more » « less
-
Abstract Parallel Markov Chain Monte Carlo (pMCMC) algorithms generate clouds of proposals at each step to efficiently resolve a target probability distribution $$\mu $$. We build a rigorous foundational framework for pMCMC algorithms that situates these methods within a unified ‘extended phase space’ measure-theoretic formalism. Drawing on our recent work that provides a comprehensive theory for reversible single-proposal methods, we herein derive general criteria for multiproposal acceptance mechanisms that yield ergodic chains on general state spaces. Our formulation encompasses a variety of methodologies, including proposal cloud resampling and Hamiltonian methods, while providing a basis for the derivation of novel algorithms. In particular, we obtain a top-down picture for a class of methods arising from ‘conditionally independent’ proposal structures. As an immediate application of this formalism, we identify several new algorithms including a multiproposal version of the popular preconditioned Crank–Nicolson (pCN) sampler suitable for high- and infinite-dimensional target measures that are absolutely continuous with respect to a Gaussian base measure. To supplement the aforementioned theoretical results, we carry out a selection of numerical case studies that evaluate the efficacy of these novel algorithms. First, noting that the true potential of pMCMC algorithms arises from their natural parallelizability and the ease with which they map to modern high-performance computing architectures, we provide a limited parallelization study using TensorFlow and a graphics processing unit to scale pMCMC algorithms that leverage as many as 100k proposals at each step. Second, we use our multiproposal pCN algorithm (mpCN) to resolve a selection of problems in Bayesian statistical inversion for partial differential equations motivated by fluid measurement. These examples provide preliminary evidence of the efficacy of mpCN for high-dimensional target distributions featuring complex geometries and multimodal structures.more » « less
-
null (Ed.)Recent years have witnessed a growing body of research on autonomous activity recognition models for use in deployment of mobile systems in new settings such as when a wearable system is adopted by a new user. Current research, however, lacks comprehensive frameworks for transfer learning. Specifically, it lacks the ability to deal with partially available data in new settings. To address these limitations, we propose {\it OptiMapper}, a novel uninformed cross-subject transfer learning framework for activity recognition. OptiMapper is a combinatorial optimization framework that extracts abstract knowledge across subjects and utilizes this knowledge for developing a personalized and accurate activity recognition model in new subjects. To this end, a novel community-detection-based clustering of unlabeled data is proposed that uses the target user data to construct a network of unannotated sensor observations. The clusters of these target observations are then mapped onto the source clusters using a complete bipartite graph model. In the next step, the mapped labels are conditionally fused with the prediction of a base learner to create a personalized and labeled training dataset for the target user. We present two instantiations of OptiMapper. The first instantiation, which is applicable for transfer learning across domains with identical activity labels, performs a one-to-one bipartite mapping between clusters of the source and target users. The second instantiation performs optimal many-to-one mapping between the source clusters and those of the target. The many-to-one mapping allows us to find an optimal mapping even when the target dataset does not contain sufficient instances of all activity classes. We show that this type of cross-domain mapping can be formulated as a transportation problem and solved optimally. We evaluate our transfer learning techniques on several activity recognition datasets. Our results show that the proposed community detection approach can achieve, on average, 69%$ utilization of the datasets for clustering with an overall clustering accuracy of 87.5%. Our results also suggest that the proposed transfer learning algorithms can achieve up to 22.5% improvement in the activity recognition accuracy, compared to the state-of-the-art techniques. The experimental results also demonstrate high and sustained performance even in presence of partial data.more » « less
An official website of the United States government
