Thermodynamics-inspired explanations of artificial intelligence

Mehdi, Shams (ORCID:0000000240787501); Tiwary, Pratyush (ORCID:0000000224126922)

doi:10.1038/s41467-024-51970-x

Citation Details

Thermodynamics-inspired explanations of artificial intelligence

Abstract In recent years, predictive machine learning models have gained prominence across various scientific domains. However, their black-box nature necessitates establishing trust in them before accepting their predictions as accurate. One promising strategy involves employing explanation techniques that elucidate the rationale behind a model’s predictions in a way that humans can understand. However, assessing the degree of human interpretability of these explanations is a nontrivial challenge. In this work, we introduce interpretation entropy as a universal solution for evaluating the human interpretability of any linear model. Using this concept and drawing inspiration from classical thermodynamics, we present Thermodynamics-inspired Explainable Representations of AI and other black-box Paradigms, a method for generating optimally human-interpretable explanations in a model-agnostic manner. We demonstrate the wide-ranging applicability of this method by explaining predictions from various black-box model architectures across diverse domains, including molecular simulations, text, and image classification. more »

Award ID(s):: 2044165

PAR ID:: 10540887

Author(s) / Creator(s):: Mehdi, Shams; Tiwary, Pratyush

Publisher / Repository:: Nature Publishing Group

Date Published:: 2024-09-09

Journal Name:: Nature Communications

Volume:: 15

Issue:: 1

ISSN:: 2041-1723

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1038/s41467-024-51970-x

More Like this