Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available December 31, 2025
-
Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. Identifying these circuits is particularly useful in the context of building models that are robust to shortcut learning and distribution shifts. Identifying these shortcut encoding circuits allows us to "turn them off" by replacing their outputs with random values or zeros. Many papers have claimed to identify meaningful circuits in existing language models. In this paper, we focus on evaluating candidate circuits. Specifically, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them. The criteria focus on the extent to which the LLM's behavior is preserved, the degree of localization of this behavior, and whether the circuit is minimal. We apply these tests to six circuits described in the research literature. We find that synthetic circuits -- circuits that are hard-coded in the model -- align with the idealized properties. Circuits discovered in Transformer models satisfy the criteria to varying degrees. To facilitate future empirical studies of circuits, we created the circuitry package, a wrapper around the TransformerLens library, which abstracts away lower-level manipulations of hooks and activations. The software is available at https://github.com/blei-lab/circuitry.more » « lessFree, publicly-accessible full text available December 9, 2025
-
Free, publicly-accessible full text available July 16, 2025
-
Free, publicly-accessible full text available May 2, 2025
-
Current causal inference approaches for estimating conditional average treatment effects (CATEs) often prioritize accuracy. However, in resource constrained settings, decision makers may only need a ranking of individuals based on their estimated CATE. In these scenarios, exact CATE estimation may be an unnecessarily challenging task, particularly when the underlying function is difficult to learn. In this work, we study the relationship between CATE estimation and optimizing for CATE ranking, demonstrating that optimizing for ranking may be more appropriate than optimizing for accuracy in certain settings. Guided by our analysis, we propose an approach to directly optimize for rankings of individuals to inform treatment assignment that aims to maximize benefit. Our tree-based approach maximizes the expected benefit of the treatment assignment using a novel splitting criteria. In an empirical case-study across synthetic datasets, our approach leads to better treatment assignments compared to CATE estimation methods as measured by expected total benefit. By providing a practical and efficient approach to learning a CATE ranking, this work offers an important step towards bridging the gap between CATE estimation techniques and their downstream applications.more » « lessFree, publicly-accessible full text available May 2, 2025
-
Free, publicly-accessible full text available May 2, 2025
-
Individuals such as medical interns who work in high-stress environments often face mental health challenges including depression and anxiety. These challenges are exacerbated by the limited access to traditional mental health services due to demanding work schedules. In this context, mobile health interventions such as push notifications targeting behavioral modification to improve mental health outcomes could deliver much needed support. In this work, we study the effectiveness of these interventions on subgroups, by studying the conditional average causal effect of these interventions. We design a two step approach for estimating the conditional average causal effect of interventions and identifying specific subgroups of the population who respond positively or negatively to the interventions. The first step of our approach follows existing causal effect estimation approaches, while the second step involves a novel tree-based approach to identify subgroups who respond to the treatment. The novelty in the second step stems from a pruning approach that deploys hypothesis testing to identify subgroups experiencing a statistically significant positive or negative causal effect. Using a semi-simulated dataset, we show that our approach retrieves affected subpopulations with a higher precision than alternatives while maintaining the same recall and accuracy. Using a real dataset with randomized push interventions among the medical intern population at a large hospital, we show how our approach can be used to identify subgroups who might benefit the most from interventions.more » « less
-
Individuals such as medical interns who work in high-stress environments often face mental health challenges including depression and anxiety. These challenges are exacerbated by the limited access to traditional mental health services due to demanding work schedules. In this context, mobile health interventions such as push notifications targeting behavioral modification to improve mental health outcomes could deliver much needed support. In this work, we study the effectiveness of these interventions on subgroups, by studying the conditional average causal effect of these interventions. We design a two step approach for estimating the conditional average causal effect of interventions and identifying specific subgroups of the population who respond positively or negatively to the interventions. The first step of our approach follows existing causal effect estimation approaches, while the second step involves a novel tree-based approach to identify subgroups who respond to the treatment. The novelty in the second step stems from a pruning approach that deploys hypothesis testing to identify subgroups experiencing a statistically significant positive or negative causal effect. Using a semi-simulated dataset, we show that our approach retrieves affected subpopulations with a higher precision than alternatives while maintaining the same recall and accuracy. Using a real dataset with randomized push interventions among the medical intern population at a large hospital, we show how our approach can be used to identify subgroups who might benefit the most from interventions.more » « less
-
Robustness to distribution shift and fairness have independently emerged as two important desiderata required of modern machine learning models. While these two desiderata seem related, the connection between them is often unclear in practice. Here, we discuss these connections through a causal lens, focusing on anti-causal prediction tasks, where the input to a classifier (e.g., an image) is assumed to be generated as a function of the target label and the protected attribute. By taking this perspective, we draw explicit connections between a common fairness criterion - separation - and a common notion of robustness - risk invariance. These connections provide new motivation for applying the separation criterion in anticausal settings, and inform old discussions regarding fairness-performance tradeoffs. In addition, our findings suggest that robustness-motivated approaches can be used to enforce separation, and that they often work better in practice than methods designed to directly enforce separation. Using a medical dataset, we empirically validate our findings on the task of detecting pneumonia from X-rays, in a setting where differences in prevalence across sex groups motivates a fairness mitigation. Our findings highlight the importance of considering causal structure when choosing and enforcing fairness criteria.more » « less
-
Differential measurement error, which occurs when the error in the measured outcome is correlated with the treatment renders the causal effect unidentifiable from observational data. In this work, we study conditional differential measurement error, where a subgroup of the population is known to be prone to differential measurement error. Under an assumption about the direction (but not magnitude) of the measurement error, we derive sharp bounds on the conditional average treatment effect, and present an approach to estimate them. We empirically validate our approach on semi-synthetic da, showing that it gives more credible and informative bound than other approaches. In addition, we implement our approach on real data, showing its utility in guiding decisions about dietary modification intervals to improve nutritional intake.more » « less