Learning to Decode Collaboratively with Multiple Language Models

Shen, Shannon Zejiang; Lang, Hunter; Wang, Bailin; Kim, Yoon; Sontag, David

Citation Details

We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the marginal likelihood of a training set under our latent variable model, the base LLM automatically learns when to generate itself and when to call on one of the “assistant” language models to generate, all without direct supervision. Token-level collaboration during decoding allows for a fusion of each model’s expertise in a manner tailored to the specific task at hand. Our collaborative decoding is especially useful in cross-domain settings where a generalist base LLM learns to invoke domain ex- pert models. On instruction-following, domain- specific QA, and reasoning tasks, we show that the performance of the joint system exceeds that of the individual models. Through qualitative analysis of the learned latent decisions, we show models trained with our method exhibit several interesting collaboration patterns, e.g., template-filling. more »

Award ID(s):: 2205320

PAR ID:: 10535763

Author(s) / Creator(s):: Shen, Shannon Zejiang; Lang, Hunter; Wang, Bailin; Kim, Yoon; Sontag, David

Publisher / Repository:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL)

Date Published:: 2024-08-11

Volume:: 1

Page Range / eLocation ID:: 12974-12990

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this