NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Teaching invariance using priviledged mediation information

Zapzalka, Dylan; Makar, Maggie (December 2024, Causal representation learning workshop at NeurIPS)

The performance of deep neural networks often deteriorates in out-of-distribution settings due to relying on easy-to-learn but unreliable spurious associations known as shortcuts. Recent work attempting to mitigate shortcut learning relies on a priori knowledge of what the shortcut is and requires a strict overlap assumption with respect to the shortcut and the labels. In this paper, we present a causally-motivated teacher-student framework that encourages invariance to all shortcuts by leveraging privileged mediation information. The Teaching Invariance using Privileged Mediation Information (TIPMI) framework distills knowledge from a counterfactually invariant teacher trained using privileged mediation information to a student predictor that uses non-privileged features. We analyze the theoretical properties of our proposed estimator, showing that TIPMI promotes invariance to multiple unknown shortcuts and has better finite-sample efficiency. We empirically verify our theoretical findings by showing that TIPMI outperforms several state-of-the-art methods on two vision datasets and one language dataset.
more » « less
Free, publicly-accessible full text available December 11, 2025
Hypothesis testing the circuit hypothesis in LLMs

Shi, Claudia; Beltran, Nicolas V; Nazaret, Achille; Zheng, Carolina; Alonso, Adria G; Jesson, Andrew; Makar, Maggie; Blei, David (December 2024, Advances in neural information processing systems)

Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. Identifying these circuits is particularly useful in the context of building models that are robust to shortcut learning and distribution shifts. Identifying these shortcut encoding circuits allows us to "turn them off" by replacing their outputs with random values or zeros. Many papers have claimed to identify meaningful circuits in existing language models. In this paper, we focus on evaluating candidate circuits. Specifically, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them. The criteria focus on the extent to which the LLM's behavior is preserved, the degree of localization of this behavior, and whether the circuit is minimal. We apply these tests to six circuits described in the research literature. We find that synthetic circuits -- circuits that are hard-coded in the model -- align with the idealized properties. Circuits discovered in Transformer models satisfy the criteria to varying degrees. To facilitate future empirical studies of circuits, we created the circuitry package, a wrapper around the TransformerLens library, which abstracts away lower-level manipulations of hooks and activations. The software is available at https://github.com/blei-lab/circuitry.
more » « less
Free, publicly-accessible full text available December 9, 2025
Partial identification of the maximum mean discrepancy with mismeasured data

Nafshi, Ron; Makar, Maggie (July 2024, Proceedings of Machine Learning Research)

Full Text Available

Search for: All records