Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention

Tan, Zhen; Chen, Tianlong; Zhang, Zhenyu; Liu, Huan

doi:10.1609/aaai.v38i19.30160

Citation Details

Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention

Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. However, the enigmatic ``black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. While past approaches, such as attention visualization, pivotal subnetwork extraction, and concept-based analyses, offer some insight, they often focus on either local or global explanations within a single dimension, occasionally falling short in providing comprehensive clarity. In response, we propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs. Our framework, termed SparseCBM, innovatively integrates sparsity to elucidate three intertwined layers of interpretation: input, subnetwork, and concept levels. In addition, the newly introduced dimension of interpretable inference-time intervention facilitates dynamic adjustments to the model during deployment. Through rigorous empirical evaluations on real-world datasets, we demonstrate that SparseCBM delivers a profound understanding of LLM behaviors, setting it apart in both interpreting and ameliorating model inaccuracies. Codes are provided in supplements. more »

Award ID(s):: 2229461

PAR ID:: 10555572

Author(s) / Creator(s):: Tan, Zhen; Chen, Tianlong; Zhang, Zhenyu; Liu, Huan

Publisher / Repository:: AAAI

Date Published:: 2024-03-25

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 38

Issue:: 19

ISSN:: 2159-5399

Page Range / eLocation ID:: 21619 to 21627

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v38i19.30160

More Like this