Pretrained language models often do not perform
tasks in ways that are in line with our
preferences, e.g., generating offensive text or
factually incorrect summaries. Recent work
approaches the above issue by learning from
a simple form of human evaluation: comparisons
between pairs of model-generated task
outputs. Comparison feedback conveys limited
information about human preferences per
human evaluation. Here, we propose to learn
from natural language feedback, which conveys
more information per human evaluation.
We learn from language feedback on model
outputs using a three-step learning algorithm.
First, we condition the language model on the
initial output and feedback to generate many
refinements. Second, we choose the refinement
with the highest similarity to the feedback.
Third, we finetune a language model to
maximize the likelihood of the chosen refinement
given the input. In synthetic experiments,
we first evaluate whether language models accurately
incorporate feedback to produce refinements,
finding that only large language
models (175B parameters) do so. Using only
100 samples of human-written feedback, our
learning algorithm finetunes a GPT-3 model to
roughly human-level summarization ability.
more »
« less
This content will become publicly available on February 28, 2025
Learning from Natural Language Feedback
The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the target distribution and demonstrate proof-of-concepts on text summarization and program synthesis tasks. For code generation, ILF improves a Codegen-Mono 6.1B model’s pass@1 rate from 22% to 36% on the MBPP benchmark, outperforming both fine-tuning on MBPP and on human- written repaired programs. For summarization, we show that ILF can be combined with learning from human preferences to improve a GPT-3 model’s summarization performance to be comparable to human quality, outperforming fine-tuning on human-written summaries. Overall, our results suggest that ILF is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM’s performance on a variety of tasks.
more »
« less
- Award ID(s):
- 2046556
- PAR ID:
- 10544335
- Publisher / Repository:
- OpenReview
- Date Published:
- Journal Name:
- Transactions on machine learning research
- ISSN:
- 2835-8856
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying reinforcement learning algorithm is complex and requires additional training for reward and value networks. In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. Such an algorithm doesn’t require any additional parameters except for the original language model and maximally reuses the pretraining pipeline. To achieve this, we formulate instruction alignment problem for language models as a goal-reaching problem in decision making. We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions. The resulting two-stage algorithm shed light to a family of reward-free approaches that utilize the hindsightly relabeled instructions based on feedback. We evaluate the performance of HIR extensively on 12 challenging BigBench reasoning tasks and show that HIR outperforms the baseline algorithms and is comparable to or even surpasses supervised fine-tuning. The implementation of HIR is available at https://github.com/tianjunz/HIR.more » « less
-
Duh, Kevin ; G'omez-Adorno, Helena ; Bethard, Steven (Ed.)The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resourceconstrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs’ text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dualscoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale.more » « less
-
Instruction-tuned large language models (LLMs), such as ChatGPT, have led to promising zero-shot performance in discriminative natural language understanding (NLU) tasks. This involves querying the LLM using a prompt containing the question, and the candidate labels to choose from. The question-answering capabilities of ChatGPT arise from its pre-training on large amounts of human-written text, as well as its subsequent fine-tuning on human preferences, which motivates us to ask: Does ChatGPT also inherit humans’ cognitive biases? In this paper, we study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We have two main findings: i) ChatGPT’s decision is sensitive to the order of labels in the prompt; ii) ChatGPT has a clearly higher chance to select the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions. We release the source code at https://github.com/wangywUST/PrimacyEffectGPT.more » « less
-
Recent work has shown that prompting language models with code-like representations of natural language leads to performance improvements on structured reasoning tasks. However, such tasks comprise only a small subset of all natural language tasks. In our work, we seek to answer whether or not code-prompting is the preferred way of interacting with language models in general. We compare code and text prompts across three popular GPT models (davinci, code-davinci-002, and text-davinci-002) on a broader selection of tasks (e.g., QA, sentiment, summarization) and find that with few exceptions, code prompts do not consistently outperform text prompts. Furthermore, we show that the style of code prompt has a large effect on performance for some (but not all) tasks and that fine-tuning on text instructions leads to better relative performance of code prompts.more » « less