skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1740855

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Domainadaptation(DA)isastatisticallearningproblemthatariseswhenthedistribution ofthesourcedatausedtotrainamodeldi↵ersfromthatofthetargetdatausedtoevaluate themodel. WhilemanyDAalgorithmshavedemonstratedconsiderableempiricalsuccess, blindly applying these algorithms can often lead to worse performance on new datasets. Toaddressthis, itiscrucialtoclarifytheassumptionsunderwhichaDAalgorithmhas good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, whichcanbeestimatedthroughconditionalinvariantpenalty(CIP),playthreeprominent rolesinprovidingtargetriskguaranteesinDA.First,weproposeanewalgorithmbased on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target riskguaranteesbeyondsimplesettingssuchascovariateshiftandlabelshift. Second,we showthatCICshelpidentifylargediscrepanciesbetweensourceandtargetrisksofother DAalgorithms. Finally,wedemonstratethatincorporatingCICsintothedomaininvariant projection(DIP)algorithmcanaddressitsfailurescenariocausedbylabel-flippingfeatures. We support our new algorithms and theoretical findings via numerical experiments on syntheticdata,MNIST,CelebA,Camelyon17,andDomainNetdatasets. 
    more » « less
    Free, publicly-accessible full text available May 25, 2026
  2. Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets. 
    more » « less
    Free, publicly-accessible full text available February 18, 2026
  3. Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in addition to features in isolation. These attributions are shown to yield insights across real-world domains, including bio-imaging, cosmology image and natural-language processing. We then show how these attributions can be used to directly improve the generalization of a neural network or to distill it into a simple model. Throughout the chapter, we emphasize the use of reality checks to scrutinize the proposed interpretation techniques. (Code for all methods in this chapter is available at github.com/csinva and github.com/Yu-Group, implemented in PyTorch [54]). 
    more » « less
  4. Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we propose adaptive wavelet distillation (AWD), a method which aims to distill information from a trained neural network into a wavelet transform. Specifically, AWD penalizes feature attributions of a neural network in the wavelet domain to learn an effective multi-resolution wavelet transform. The resulting model is highly predictive, concise, computationally efficient, and has properties (such as a multi-scale structure) which make it easy to interpret. In close collaboration with domain experts, we showcase how AWD addresses challenges in two real-world settings: cosmological parameter inference and molecular-partner prediction. In both cases, AWD yields a scientifically interpretable and concise model which gives predictive performance better than state-of-the-art neural networks. Moreover, AWD identifies predictive features that are scientifically meaningful in the context of respective domains. All code and models are released in a full-fledged package available on Github. 
    more » « less