NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms

Wu, Keru; Chen, Yuansi; Ha, Wooseok; Yu, Bin (May 2025, Journal of Machine Learning Research)

Domainadaptation(DA)isastatisticallearningproblemthatariseswhenthedistribution ofthesourcedatausedtotrainamodeldi↵ersfromthatofthetargetdatausedtoevaluate themodel. WhilemanyDAalgorithmshavedemonstratedconsiderableempiricalsuccess, blindly applying these algorithms can often lead to worse performance on new datasets. Toaddressthis, itiscrucialtoclarifytheassumptionsunderwhichaDAalgorithmhas good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, whichcanbeestimatedthroughconditionalinvariantpenalty(CIP),playthreeprominent rolesinprovidingtargetriskguaranteesinDA.First,weproposeanewalgorithmbased on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target riskguaranteesbeyondsimplesettingssuchascovariateshiftandlabelshift. Second,we showthatCICshelpidentifylargediscrepanciesbetweensourceandtargetrisksofother DAalgorithms. Finally,wedemonstratethatincorporatingCICsintothedomaininvariant projection(DIP)algorithmcanaddressitsfailurescenariocausedbylabel-flippingfeatures. We support our new algorithms and theoretical findings via numerical experiments on syntheticdata,MNIST,CelebA,Camelyon17,andDomainNetdatasets.
more » « less
Free, publicly-accessible full text available May 25, 2026
Mitigating over-exploration in latent space optimization using LES

Ronen, Omer; Humayun, Ahmed_Imtiaz; Baraniuk, Richard; Balestriero, Randall; Yu, Bin (May 2025, ICML)

We develop Latent Exploration Score (LES) to mitigate over-exploration in Latent Space Op- timization (LSO), a popular method for solv- ing black-box discrete optimization problems. LSO utilizes continuous optimization within the latent space of a Variational Autoencoder (VAE) and is known to be susceptible to over- exploration, which manifests in unrealistic solu- tions that reduce its practicality. LES leverages the trained decoder’s approximation of the data distribution, and can be employed with any VAE decoder–including pretrained ones–without addi- tional training, architectural changes or access to the training data. Our evaluation across five LSO benchmark tasks and twenty-two VAE mod- els demonstrates that LES always enhances the quality of the solutions while maintaining high objective values, leading to improvements over ex- isting solutions in most cases. We believe that new avenues to LSO will be opened by LES’ ability to identify out of distribution areas, differentiability, and computational tractability.
more » « less
Free, publicly-accessible full text available May 1, 2026
Fast Interpretable Greedy-Tree Sums

https://doi.org/10.1073/pnas.2310151122

Tan, Yan Shuo; Singh, Chandan; Nasseri, Keyan; Agarwal, Abhineet; Duncan, James; Ronen, Omer; Epland, Matthew; Kornblith, Aaron; Yu, Bin (February 2025, Proceedings of the National Academy of Sciences)

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.
more » « less
Free, publicly-accessible full text available February 18, 2026
The Impact of Initialization on LoRA Finetuning Dynamics

Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin (September 2024, NeurIPS 2024)

In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. [19]. Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes finetuning starts from the pretrained model. These two initialization schemes are seemingly sim- ilar. They should in-principle yield the same performance and share the same optimal learning rate. We demonstrate that this is an incorrect intuition and that the first scheme (initializing B to zero and A to random) on average yields better performance compared to the other scheme. Our theoretical analysis shows that the reason behind this might be that the first initialization allows the use of larger learning rates (without causing output instability) compared to the second initial- ization, resulting in more efficient learning of the first scheme. We validate our results with extensive experiments on LLMs.
more » « less
Full Text Available
Minimum-Norm Interpolation Under Covariate Shift

Mallinar, Neil; Zane, Austin; Frei, Spencer; Yu, Bin (July 2024, ICML)

Full Text Available
LoRA+: Efficient Low Rank Adaptation of Large Models

Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin (July 2024, ICML)

Full Text Available
The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis

Tan, Yan_Shuo; Ronen, Omer; Saarinen, Theo; Yu, Bin (June 2024, Arxiv)

Full Text Available
ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks

Ronen, Omer; Humayun, Ahmed; Balestriero, Randall; Baraniuk, Richard; Yu, Bin (June 2024, Arxiv)

Full Text Available
The Impact of Initialization on LoRA Finetuning Dynamics

Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin (June 2024, Arxiv)

Full Text Available
ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

Sun, Liwen; Agawal, Abhineet; Kornblith, Aaron; Yu, Bin; Xiong, Chenyan (May 2024, ICML)

Full Text Available

« Prev Next »

Search for: All records