Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

Mireshghallah, Niloofar; Kim, Hyunwoo; Zhou, Xuhui; Tsvetkov, Yulia; Sap, Maarten; Shokri, Reza; Choi, Yejin

Citation Details

Existing efforts on quantifying privacy implications for large language models (LLMs) solely focus on measuring leakage of training data. In this work, we shed light on the often-overlooked interactive settings where an LLM receives information from multiple sources and generates an output to be shared with other entities, creating the potential of exposing sensitive input data in inappropriate contexts. In these scenarios, humans nat- urally uphold privacy by choosing whether or not to disclose information depending on the context. We ask the question “Can LLMs demonstrate an equivalent discernment and reasoning capability when considering privacy in context?” We propose CONFAIDE, a benchmark grounded in the theory of contextual integrity and designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. CONFAIDE consists of four tiers, gradually increasing in complexity, with the final tier evaluating contextual privacy reasoning and theory of mind capabilities. Our experiments show that even commercial models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively, highlighting the urgent need for a new direction of privacy-preserving approaches as we demonstrate a larger underlying problem stemmed in the models’ lack of reasoning capabilities. more »

Award ID(s):: 2142739 2203097 2125201

PAR ID:: 10520218

Author(s) / Creator(s):: Mireshghallah, Niloofar; Kim, Hyunwoo; Zhou, Xuhui; Tsvetkov, Yulia; Sap, Maarten; Shokri, Reza; Choi, Yejin

Publisher / Repository:: International Conference on Learning Representations

Date Published:: 2024-05-15

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this