skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 14, 2025

Title: Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations
When trained on biased datasets, Deep Neural Networks (DNNs) often make predictions based on attributes derived from features spuriously correlated with the target labels. This is especially problematic if these irrelevant features are easier for the model to learn than the truly relevant ones. Many existing approaches, called debiasing methods, have been proposed to address this issue, but they often require predefined bias labels and entail significantly increased computational complexity by incorporating extra auxiliary models. Instead, we provide an orthogonal perspective from the existing approaches, inspired by cognitive science, specifically Global Workspace Theory (GWT). Our method, Debiasing Global Workspace (DGW), is a novel debiasing framework that consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Additionally, DGW enhances the transparency of decision-making processes by visualizing which features of the inputs the model focuses on during training and inference through attention masks. We begin by proposing an instantiation of GWT for the debiasing method. We then outline the implementation of each component within DGW. At the end, we validate our method across various biased datasets, proving its effectiveness in mitigating biases and improving model performance.  more » « less
Award ID(s):
2223839
PAR ID:
10629929
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Proceedings of Machine Learning Research
Date Published:
Volume:
285
Page Range / eLocation ID:
85–99
Format(s):
Medium: X
Location:
Vancouver, Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English). Our comprehensive experiments establish that existing methods are limited in their ability to prevent biased behavior in current toxicity detectors. We then propose an automatic, dialect-aware data correction method, as a proof-of-concept. Despite the use of synthetic labels, this method reduces dialectal associations with toxicity. Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases. 
    more » « less
  2. null (Ed.)
    Earlier work has shown that reusing experience from prior motion planning problems can improve the efficiency of similar, future motion planning queries. However, for robots with many degrees-of-freedom, these methods exhibit poor generalization across different environments and often require large datasets that are impractical to gather. We present SPARK and FLAME, two experience-based frameworks for sampling-based planning applicable to complex manipulators in 3D environments. Both combine samplers associated with features from a workspace decomposition into a global biased sampling distribution. SPARK decomposes the environment based on exact geometry while FLAME is more general, and uses an octree-based decomposition obtained from sensor data. We demonstrate the effectiveness of SPARK and FLAME on a real and simulated Fetch robot tasked with challenging pick-and-place manipulation problems. Our approaches can be trained incrementally and significantly improve performance with only a handful of examples, generalizing better over diverse tasks and environments as compared to prior approaches. 
    more » « less
  3. Many interpretable AI approaches have been proposed to provide plausible explanations for a model’s decision-making. However, configuring an explainable model that effectively communicates among computational modules has received less attention. A recently proposed shared global workspace theory showed that networks of distributed modules can benefit from sharing information with a bottle-necked memory because the communication constraints encourage specialization, compositionality, and synchronization among the modules. Inspired by this, we propose Concept-Centric Transformers, a simple yet effective configuration of the shared global workspace for interpretability, consisting of: i) an object-centric-based memory module for extracting semantic concepts from input features, ii) a cross-attention mechanism between the learned concept and input embeddings, and iii) standard classification and explanation losses to allow human analysts to directly assess an explanation for the model’s classification reasoning. We test our approach against other existing concept-based methods on classification tasks for various datasets, including CIFAR100, CUB-200-2011, and ImageNet, and we show that our model achieves better classification accuracy than all baselines across all problems but also generates more consistent concept-based explanations of classification output. 
    more » « less
  4. Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise. 
    more » « less
  5. Natural language understanding (NLU) models often suffer from unintended dataset biases. Among bias mitigation methods, ensemble-based debiasing methods, especially product-of-experts (PoE), have stood out for their impressive empirical success. However, previous ensemble-based debiasing methods typically apply debiasing on top-level logits without directly addressing biased attention patterns. Attention serves as the main media of feature interaction and aggregation in PLMs and plays a crucial role in providing robust prediction. In this paper, we propose REsidual Attention Debiasing (READ), an end-to-end debiasing method that mitigates unintended biases from attention. Experiments on three NLU benchmarks show that READ significantly improves the OOD performance of BERT-based models, including +12.9% accuracy on HANS, +11.0% accuracy on FEVER-Symmetric, and +2.7% F1 on PAWS. Detailed analyses demonstrate the crucial role of unbiased attention in robust NLU models and that READ effectively mitigates biases in attention. 
    more » « less