NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Kiciman, Emre; Ness, Robert; Sharma, Amit; Tan, Chenhao (October 2024, Transactions on machine learning research)

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial" study of LLMs to benchmark their capability in generating causal arguments. Across a wide range of tasks, we find that LLMs can generate text corresponding to correct causal arguments with high probability, surpassing the best-performing existing methods. Algorithms based on GPT-3.5 and 4 outperform existing algorithms on a pairwise causal discovery task (97%, 13 points gain), counterfactual reasoning task (92%, 20 points gain) and event causality (86% accuracy in determining necessary and sufficient causes in vignettes). We perform robustness checks across tasks and show that the capabilities cannot be explained by dataset memorization alone, especially since LLMs generalize to novel datasets that were created after the training cutoff date. That said, LLMs exhibit unpredictable failure modes, and we discuss the kinds of errors that may be improved and what are the fundamental limits of LLM-based answers. Overall, by operating on the text metadata, LLMs bring capabilities so far understood to be restricted to humans, such as using collected knowledge to generate causal graphs or identifying background causal context from natural language. As a result, LLMs may be used by human domain experts to save effort in setting up a causal analysis, one of the biggest impediments to the widespread adoption of causal methods. Given that LLMs ignore the actual data, our results also point to a fruitful research direction of developing algorithms that combine LLMs with existing causal techniques. Code and datasets are available at https://github.com/py-why/pywhy-llm.
more » « less
Full Text Available
Machine Explanations and Human Understanding

https://doi.org/10.1145/3593013.3593970

Chen, Chacha; Feng, Shi; Sharma, Amit; Tan, Chenhao (June 2023, ACM)
Solid-solution and precipitation softening effects in defect-free faceted Nickel-Iron nanoparticles

https://doi.org/10.1016/j.actamat.2022.118527

Sharma, Amit; Mendelsohn, Oz; Bisht, Anuj; Michler, Johann; Koju, Raj Kiran; Mishin, Yuri; Rabkin, Eugen (January 2023, Acta Materialia)

Full Text Available
Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

https://doi.org/10.1145/3461702.3462597

Kommiya Mothilal, Ramaravind; Mahajan, Divyat; Tan, Chenhao; Sharma, Amit (May 2021, Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society)
null (Ed.)
Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complimentary of these two approaches. Our evaluation on three benchmark datasets --- Adult-Income, LendingClub, and German-Credit --- confirms the complimentary. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like Wachter et al. and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model's prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem.
more » « less
Full Text Available
Explaining machine learning classifiers through diverse counterfactual explanations

https://doi.org/10.1145/3351095.3372850

Mothilal, Ramaravind K.; Sharma, Amit; Tan, Chenhao (January 2020, FAT*)

Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on four real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries, outperforming prior approaches to generating diverse counterfactuals. We provide an implementation of the framework at https://github.com/microsoft/DiCE.
more » « less
Full Text Available
Predicting history

https://doi.org/10.1038/s41562-019-0620-8

Risi, Joseph; Sharma, Amit; Shah, Rohan; Connelly, Matthew; Watts, Duncan J. (June 2019, Nature Human Behaviour)

Full Text Available
Revisiting Fluorescent Calixarenes: From Molecular Sensors to Smart Materials

https://doi.org/10.1021/acs.chemrev.8b00605

Kumar, Rajesh; Sharma, Amit; Singh, Hardev; Suating, Paolo; Kim, Hyeong Seok; Sunwoo, Kyoung; Shim, Inseob; Gibb, Bruce C.; Kim, Jong Seung (August 2019, Chemical Reviews)

Full Text Available

Search for: All records