Coactive Learning for Large Language Models using Implicit User Feedback

Tucker, Aaron David; Brantley, Kianté; Cahall, Adam; Joachims, Thorsten

Citation Details

We propose coactive learning as a model and feedback mechanism for training large language models (LLMs). The key insight is that users provide implicit feedback whenever they edit the text y proposed by an LLM. While the edited text y¯ is typically not a gold-standard example for supervised training, coactive learning merely requires that the edited text y¯ is an improvement over the proposed text y. Note that such weak implicit preference feedback y¯≻y is available in many application settings on a per-user basis, thus enabling the personalization of LLMs. In this paper, we develop the theoretical basis for coactive training of non-linear models, and we derive CoRLL as the first coactive learning algorithm for LLMs. Empirical results indicate that CoRLL is effective even for weak and noisy coactive preference feedback, making it a promising algorithm for training and personalization of LLMs from feedback that is naturally collected in many use cases. more »

Award ID(s):: 2312865

PAR ID:: 10557888

Author(s) / Creator(s):: Tucker, Aaron David; Brantley, Kianté; Cahall, Adam; Joachims, Thorsten

Publisher / Repository:: International Conference on Machine Learning (ICML)

Date Published:: 2024-07-17

Edition / Version:: PMLR

Volume:: 235

ISSN:: 2640-3498

Page Range / eLocation ID:: 48809-48822

Format(s):: Medium: X

Location:: Vienna, Austria

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this