NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms

Wu, Keru; Chen, Yuansi; Ha, Wooseok; Yu, Bin (May 2025, Journal of Machine Learning Research)

Domainadaptation(DA)isastatisticallearningproblemthatariseswhenthedistribution ofthesourcedatausedtotrainamodeldi↵ersfromthatofthetargetdatausedtoevaluate themodel. WhilemanyDAalgorithmshavedemonstratedconsiderableempiricalsuccess, blindly applying these algorithms can often lead to worse performance on new datasets. Toaddressthis, itiscrucialtoclarifytheassumptionsunderwhichaDAalgorithmhas good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, whichcanbeestimatedthroughconditionalinvariantpenalty(CIP),playthreeprominent rolesinprovidingtargetriskguaranteesinDA.First,weproposeanewalgorithmbased on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target riskguaranteesbeyondsimplesettingssuchascovariateshiftandlabelshift. Second,we showthatCICshelpidentifylargediscrepanciesbetweensourceandtargetrisksofother DAalgorithms. Finally,wedemonstratethatincorporatingCICsintothedomaininvariant projection(DIP)algorithmcanaddressitsfailurescenariocausedbylabel-flippingfeatures. We support our new algorithms and theoretical findings via numerical experiments on syntheticdata,MNIST,CelebA,Camelyon17,andDomainNetdatasets.
more » « less
Free, publicly-accessible full text available May 25, 2026
Fast Interpretable Greedy-Tree Sums

https://doi.org/10.1073/pnas.2310151122

Tan, Yan Shuo; Singh, Chandan; Nasseri, Keyan; Agarwal, Abhineet; Duncan, James; Ronen, Omer; Epland, Matthew; Kornblith, Aaron; Yu, Bin (February 2025, Proceedings of the National Academy of Sciences)

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.
more » « less
Free, publicly-accessible full text available February 18, 2026
The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis

Tan, Yan_Shuo; Ronen, Omer; Saarinen, Theo; Yu, Bin (June 2024, Arxiv)

Full Text Available
Interpreting and Improving Deep-Learning Models with Reality Checks

https://doi.org/10.1007/978-3-031-04083-2_12

Singh, Chandan; Ha, Wooseok; Yu, Bin (April 2022, Lecture notes in computer science)

Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in addition to features in isolation. These attributions are shown to yield insights across real-world domains, including bio-imaging, cosmology image and natural-language processing. We then show how these attributions can be used to directly improve the generalization of a neural network or to distill it into a simple model. Throughout the chapter, we emphasize the use of reality checks to scrutinize the proposed interpretation techniques. (Code for all methods in this chapter is available at github.com/csinva and github.com/Yu-Group, implemented in PyTorch [54]).
more » « less
Full Text Available
Adaptive wavelet distillation from neural networks through interpretations

Ha, Wooseok; Singh, Chandan; Lanusse, Francois; Upadhyayula, Srigokul; Yu, Bin (December 2021, Advances in neural information processing systems)

Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we propose adaptive wavelet distillation (AWD), a method which aims to distill information from a trained neural network into a wavelet transform. Specifically, AWD penalizes feature attributions of a neural network in the wavelet domain to learn an effective multi-resolution wavelet transform. The resulting model is highly predictive, concise, computationally efficient, and has properties (such as a multi-scale structure) which make it easy to interpret. In close collaboration with domain experts, we showcase how AWD addresses challenges in two real-world settings: cosmological parameter inference and molecular-partner prediction. In both cases, AWD yields a scientifically interpretable and concise model which gives predictive performance better than state-of-the-art neural networks. Moreover, AWD identifies predictive features that are scientifically meaningful in the context of respective domains. All code and models are released in a full-fledged package available on Github.
more » « less
Full Text Available

Search for: All records