Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition

Halder, Barproda; Hamman, Faisal; Dissanayake, Pasan; Zhang, Qiuyi; Sucholutsky, Ilia; Dutta, Sanghamitra

Citation Details

This content will become publicly available on November 12, 2026

Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition

Spuriousness arises when there is an association between two or more variables in a dataset that are not causally related. In this work, we propose an explainability framework to preemptively disentangle the nature of such spurious associations in a dataset before model training. We leverage a body of work in information theory called Partial Information Decomposition (PID) to decompose the total information about the target into four nonnegative quantities, namely unique information (in core and spurious features, respectively), redundant information, and synergistic information. Our framework helps anticipate when the core or spurious feature is indispensable, when either suffices, and when both are jointly needed for an optimal classifier trained on the dataset. Next, we leverage this decomposition to propose a novel measure of the spuriousness of a dataset. We arrive at this measure systematically by examining several candidate measures, and demonstrating what they capture and miss through intuitive canonical examples and counterexamples. Our framework Spurious Disentangler consists of segmentation, dimensionality reduction, and estimation modules, with capabilities to specifically handle high-dimensional image data efficiently. Finally, we also perform empirical evaluation to demonstrate the trends of unique, redundant, and synergistic information, as well as our proposed spuriousness measure across 6 benchmark datasets under various experimental settings. We observe an agreement between our preemptive measure of dataset spuriousness and post-training model generalization metrics such as worst-group accuracy, further supporting our proposition. The code is available at https://github.com/Barproda/spuriousness-disentangler. more »

Award ID(s):: 2340006

PAR ID:: 10655319

Author(s) / Creator(s):: Halder, Barproda; Hamman, Faisal; Dissanayake, Pasan; Zhang, Qiuyi; Sucholutsky, Ilia; Dutta, Sanghamitra

Editor(s):: Sharma, Amit

Publisher / Repository:: Transactions on Machine Learning Research (TMLR)

Date Published:: 2025-11-12

Journal Name:: Transactions on machine learning research

ISSN:: 2835-8856

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on November 12, 2026
Journal Article:
The DOI is not currently available.

More Like this