skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 11 until 2:00 AM ET on Friday, June 12 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Makar, Maggie"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The performance of deep neural networks often deteriorates in out-of-distribution settings due to relying on easy-to-learn but unreliable spurious associations known as shortcuts. Recent work attempting to mitigate shortcut learning relies on \textit{a priori} knowledge of the shortcuts and invariance penalties, which are difficult to enforce in practice. To address these limitations, we study two causally-motivated methods that efficiently learn models that are invariant to shortcuts by leveraging privileged mediation information. We first adapt concept bottleneck models (CBMs) to incorporate mediators -- intermediate variables that lie on the causal path between input features and target labels -- resulting in a straightforward extension we call Mediator Bottleneck Models (MBMs). One drawback of this method is that it requires two potentially large models at inference time. To address this issue, we propose Teaching Invariance using Privileged Mediation Information (TIPMI), a novel approach which distills knowledge from a counterfactually invariant teacher trained using privileged mediation information to a student predictor that uses non-privileged, easy-to-collect features. We analyze the theoretical properties of both estimators, showing that they promote invariance to an unknown shortcut and can result in better finite-sample efficiency compared to commonly used regularization schemes. We empirically validate our theoretical findings by showing that TIPMI and MBM outperform several state-of-the-art methods on one language and two vision datasets. 
    more » « less
  2. To assess strategies for enhancing the generalizability of healthcare artificial intelligence models, we analyzed the impact of preprocessing approaches applied to medical free text, compared single- versus multiple-institution data models, and evaluated data divergence metrics. From 1,607,393 procedures across 44 U.S. institutions, deep neural network models were created to classify anesthesiology Current Procedural Terminology codes from medical free text. Three levels of text preprocessing were analyzed from minimal to automated (cSpell) with comprehensive physician review. Kullback–Leibler Divergence and k-medoid clustering were used to predict single- vs multiple-institutional model performances. Single-institution models showed a mean accuracy of 92.5% [2.8% SD] and 0.923 [0.029] F1 on internal data but generalized poorly on external data (− 22.4% [7.0%]; − 0.223 [0.081]). Free text preprocessing minimally altered performance (+ 0.51% [2.23]; + 0.004 [0.020]). An all-institution model performed worse on internal data (-4.88% [2.43%]; − 0.045 [0.020]), but improved generalizability to external data (+ 17.1% [8.7%]; + 0.182 [0.073]). Compared to vocabulary overlap and Jaccard similarity, Kullback–Leibler Divergence correlated with model performance (R2of 0.41 vs 0.16 vs 0.08, respectively) and was successful clustering institutions and identifying outlier data. Overall, pre-processing medical free text showed limited utility improving generalization of machine learning models, single institution models performed best but generalized poorly, while combined data models improved generalization but never achieved performance of single-institutional models. Kullback–Leibler Divergence provided valuable insight as a reliable heuristic to evaluate generalizability. These results have important implications in developing broad use artificial intelligence healthcare applications, providing valuable insight into their development and evaluations. 
    more » « less
  3. The performance of deep neural networks often deteriorates in out-of-distribution settings due to relying on easy-to-learn but unreliable spurious associations known as shortcuts. Recent work attempting to mitigate shortcut learning relies on a priori knowledge of what the shortcut is and requires a strict overlap assumption with respect to the shortcut and the labels. In this paper, we present a causally-motivated teacher-student framework that encourages invariance to all shortcuts by leveraging privileged mediation information. The Teaching Invariance using Privileged Mediation Information (TIPMI) framework distills knowledge from a counterfactually invariant teacher trained using privileged mediation information to a student predictor that uses non-privileged features. We analyze the theoretical properties of our proposed estimator, showing that TIPMI promotes invariance to multiple unknown shortcuts and has better finite-sample efficiency. We empirically verify our theoretical findings by showing that TIPMI outperforms several state-of-the-art methods on two vision datasets and one language dataset. 
    more » « less
  4. Abstract ProblemDespite the rapidly expanding role of artificial intelligence (AI) and machine learning (ML) in health care, a significant knowledge gap remains among clinicians in their ability to evaluate and use AI and ML tools. ApproachThe Data-Augmented, Technology-Assisted Medical Decision Making (DATA-MD) curriculum was developed to teach fundamental AI and ML concepts to clinical trainees. The curriculum contains 4 modules: (1) Introduction to AI/ML in Healthcare, (2) Epidemiology and Biostatistics in AI/ML, (3) Use of AI/ML to Augment Diagnostic Decisions, and (4) Ethical and Legal Considerations of AI/ML in Healthcare. The curriculum was piloted in May and June 2023 among University of Michigan internal medicine residents and delivered to 2 cohorts of 11 and 12 residents. All learners completed presession and postsession assessments on AI and ML knowledge before and after the curriculum and a retrospective pre-post survey to evaluate familiarity with AI and ML concepts, comfort with AI and ML literature appraisal, and attitudes toward AI and ML use. OutcomesTwenty of 23 learners (87%) completed the presession knowledge assessment before participating in the DATA-MD sessions, and all 23 learners completed the postsession knowledge assessment and retrospective pre-post survey. Median knowledge scores significantly improved for modules 1 to 3 (module 1: 2.5 to 3.0,P= .008; module 2: 1.0 to 2.0,P= .049; module 3: 2.0 to 3.0,P< .001) but not module 4 (2.0 before and after testing;P= .80). Learners reported increased confidence in their abilities to appraise the AI and ML literature and use AI and ML tools in future practice. Next StepsThe DATA-MD curriculum pilot demonstrates that a standardized AI and ML curriculum can improve trainees’ knowledge and attitudes about clinical AI and ML. Next steps include expansion to learners from different medical specialties, health professions, and academic institutions. 
    more » « less
  5. Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. Identifying these circuits is particularly useful in the context of building models that are robust to shortcut learning and distribution shifts. Identifying these shortcut encoding circuits allows us to "turn them off" by replacing their outputs with random values or zeros. Many papers have claimed to identify meaningful circuits in existing language models. In this paper, we focus on evaluating candidate circuits. Specifically, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them. The criteria focus on the extent to which the LLM's behavior is preserved, the degree of localization of this behavior, and whether the circuit is minimal. We apply these tests to six circuits described in the research literature. We find that synthetic circuits -- circuits that are hard-coded in the model -- align with the idealized properties. Circuits discovered in Transformer models satisfy the criteria to varying degrees. To facilitate future empirical studies of circuits, we created the circuitry package, a wrapper around the TransformerLens library, which abstracts away lower-level manipulations of hooks and activations. The software is available at https://github.com/blei-lab/circuitry. 
    more » « less
  6. Current causal inference approaches for estimating conditional average treatment effects (CATEs) often prioritize accuracy. However, in resource constrained settings, decision makers may only need a ranking of individuals based on their estimated CATE. In these scenarios, exact CATE estimation may be an unnecessarily challenging task, particularly when the underlying function is difficult to learn. In this work, we study the relationship between CATE estimation and optimizing for CATE ranking, demonstrating that optimizing for ranking may be more appropriate than optimizing for accuracy in certain settings. Guided by our analysis, we propose an approach to directly optimize for rankings of individuals to inform treatment assignment that aims to maximize benefit. Our tree-based approach maximizes the expected benefit of the treatment assignment using a novel splitting criteria. In an empirical case-study across synthetic datasets, our approach leads to better treatment assignments compared to CATE estimation methods as measured by expected total benefit. By providing a practical and efficient approach to learning a CATE ranking, this work offers an important step towards bridging the gap between CATE estimation techniques and their downstream applications. 
    more » « less