skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on November 12, 2026

Title: Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition
Spuriousness arises when there is an association between two or more variables in a dataset that are not causally related. In this work, we propose an explainability framework to preemptively disentangle the nature of such spurious associations in a dataset before model training. We leverage a body of work in information theory called Partial Information Decomposition (PID) to decompose the total information about the target into four nonnegative quantities, namely unique information (in core and spurious features, respectively), redundant information, and synergistic information. Our framework helps anticipate when the core or spurious feature is indispensable, when either suffices, and when both are jointly needed for an optimal classifier trained on the dataset. Next, we leverage this decomposition to propose a novel measure of the spuriousness of a dataset. We arrive at this measure systematically by examining several candidate measures, and demonstrating what they capture and miss through intuitive canonical examples and counterexamples. Our framework Spurious Disentangler consists of segmentation, dimensionality reduction, and estimation modules, with capabilities to specifically handle high-dimensional image data efficiently. Finally, we also perform empirical evaluation to demonstrate the trends of unique, redundant, and synergistic information, as well as our proposed spuriousness measure across 6 benchmark datasets under various experimental settings. We observe an agreement between our preemptive measure of dataset spuriousness and post-training model generalization metrics such as worst-group accuracy, further supporting our proposition. The code is available at https://github.com/Barproda/spuriousness-disentangler.  more » « less
Award ID(s):
2340006
PAR ID:
10655319
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Sharma, Amit
Publisher / Repository:
Transactions on Machine Learning Research (TMLR)
Date Published:
Journal Name:
Transactions on machine learning research
ISSN:
2835-8856
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher’s representations can also encode nuisance or additional information not relevant to the downstream task. Distilling such irrelevant information can actually impede the performance of a capacity-limited student model. This observation motivates our primary question: What are the information-theoretic limits of knowledge distillation? To this end, we leverage Partial Information Decomposition to quantify and explain the transferred knowledge and knowledge left to distill for a downstream task. We theoretically demonstrate that the task-relevant transferred knowledge is succinctly captured by the measure of redundant information about the task between the teacher and student. We propose a novel multi-level optimization to incorporate redundant information as a regularizer, leading to our framework of Redundant Information Distillation (RID). RID leads to more resilient and effective distillation under nuisance teachers as it succinctly quantifies task-relevant knowledge rather than simply aligning student and teacher representations. 
    more » « less
  2. This work presents an information-theoretic perspective to group fairness trade-offs in federated learning (FL) with respect to sensitive attributes, such as gender, race, etc. Existing works often focus on either global fairness (overall disparity of the model across all clients) or local fairness (disparity of the model at each client), without always considering their trade-offs. There is a lack of understanding regarding the interplay between global and local fairness in FL, particularly under data heterogeneity, and if and when one implies the other. To address this gap, we leverage a body of work in information theory called partial information decomposition (PID), which first identifies three sources of unfairness in FL, namely, Unique Disparity, Redundant Disparity, and Masked Disparity. We demonstrate how these three disparities contribute to global and local fairness using canonical examples. This decomposition helps us derive fundamental limits on the trade-off between global and local fairness, highlighting where they agree or disagree. We introduce the Accuracy and Global-Local Fairness Optimality Problem (AGLFOP), a convex optimization that defines the theoretical limits of accuracy and fairness trade-offs, identifying the best possible performance any FL strategy can attain given a dataset and client distribution. We also present experimental results on synthetic datasets and the ADULT dataset to support our theoretical findings. 
    more » « less
  3. Abstract Measures of functional connectivity have played a central role in advancing our understanding of how information is transmitted and processed within the brain. Traditionally, these studies have focused on identifying redundant functional connectivity, which involves determining when activity is similar across different sites or neurons. However, recent research has highlighted the importance of also identifying synergistic connectivity—that is, connectivity that gives rise to information not contained in either site or neuron alone. Here, we measured redundant and synergistic functional connectivity between neurons in the mouse primary auditory cortex during a sound discrimination task. Specifically, we measured directed functional connectivity between neurons simultaneously recorded with calcium imaging. We used Granger Causality as a functional connectivity measure. We then used Partial Information Decomposition to quantify the amount of redundant and synergistic information about the presented sound that is carried by functionally connected or functionally unconnected pairs of neurons. We found that functionally connected pairs present proportionally more redundant information and proportionally less synergistic information about sound than unconnected pairs, suggesting that their functional connectivity is primarily redundant. Further, synergy and redundancy coexisted both when mice made correct or incorrect perceptual discriminations. However, redundancy was much higher (both in absolute terms and in proportion to the total information available in neuron pairs) in correct behavioural choices compared to incorrect ones, whereas synergy was higher in absolute terms but lower in relative terms in correct than in incorrect behavioural choices. Moreover, the proportion of redundancy reliably predicted perceptual discriminations, with the proportion of synergy adding no extra predictive power. These results suggest a crucial contribution of redundancy to correct perceptual discriminations, possibly due to the advantage it offers for information propagation, and also suggest a role of synergy in enhancing information level during correct discriminations. 
    more » « less
  4. Abstract In a complex ecohydrologic system, vegetation and soil variables combine to dictate heat fluxes, and these fluxes may vary depending on the extent to which drivers are linearly or nonlinearly interrelated. From a modeling and causality perspective, uncertainty, sensitivity, and performance measures all relate to how information from different sources “flows” through a model to produce a target, or output. We address how model structure, broadly defined as a mapping from inputs to an output, combines with source dependencies to produce a range of information flow pathways from sources to a target. We apply information decomposition, which partitions reductions in uncertainty into synergistic, redundant, and unique information types, to a range of model cases. Toy models show that model structure and source dependencies both restrict the types of interactions that can arise between sources and targets. Regressions based on weather data illustrate how different model structures vary in their sensitivity to source dependencies, thus affecting predictive and functional performance. Finally, we compare the Surface Flux Equilibrium theory, a land‐surface model, and neural networks in estimating the Bowen ratio and find that models trade off information types particularly when sources have the highest and lowest dependencies. Overall, this study extends an information theory‐based model evaluation framework to incorporate the influence of source dependency on information pathways. This could be applied to explore behavioral ranges for both machine learning and process‐based models, and guide model development by highlighting model deficiencies based on information flow pathways that would not be apparent based on existing measures. 
    more » « less
  5. Recent literature focuses on utilizing the entity information in the sentence-level relation extraction (RE), but this risks leaking superficial and spurious clues of relations. As a result, RE still suffers from unintended entity bias, i.e., the spurious correlation between entity mentions (names) and relations. Entity bias can mislead the RE models to extract the relations that do not exist in the text. To combat this issue, some previous work masks the entity mentions to prevent the RE models from over-fitting entity mentions. However, this strategy degrades the RE performance because it loses the semantic information of entities. In this paper, we propose the CoRE (Counterfactual Analysis based Relation Extraction) debiasing method that guides the RE models to focus on the main effects of textual context without losing the entity information. We first construct a causal graph for RE, which models the dependencies between variables in RE models. Then, we propose to conduct counterfactual analysis on our causal graph to distill and mitigate the entity bias, that captures the causal effects of specific entity mentions in each instance. Note that our CoRE method is model-agnostic to debias existing RE systems during inference without changing their training processes. Extensive experimental results demonstrate that our CoRE yields significant gains on both effectiveness and generalization for RE. The source code is provided at: https://github.com/vanoracai/CoRE. 
    more » « less