Language Model Inversion

Morris, John X; Zhao, Wenting; Chiu, Justin T; Shmatikov, Vitaly; Rush, Alexander M

Citation Details

Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the model's current distribution output. We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search. On Llama-2 7b, our inversion method reconstructs prompts with a BLEU of 59 and token-level F1 of 78 and recovers 27% of prompts exactly. Code for reproducing all experiments is available at this http URL. more »

Award ID(s):: 1901030

PAR ID:: 10560283

Author(s) / Creator(s):: Morris, John X; Zhao, Wenting; Chiu, Justin T; Shmatikov, Vitaly; Rush, Alexander M

Publisher / Repository:: ICLR

Date Published:: 2024-05-01

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this