Pretrained language models often do not perform
tasks in ways that are in line with our
preferences, e.g., generating offensive text or
factually incorrect summaries. Recent work
approaches the above issue by learning from
a simple form of human evaluation: comparisons
between pairs of model-generated task
outputs. Comparison feedback conveys limited
information about human preferences per
human evaluation. Here, we propose to learn
from natural language feedback, which conveys
more information per human evaluation.
We learn from language feedback on model
outputs using a three-step learning algorithm.
First, we condition the language model on the
initial output and feedback to generate many
refinements. Second, we choose the refinement
with the highest similarity to the feedback.
Third, we finetune a language model to
maximize the likelihood of the chosen refinement
given the input. In synthetic experiments,
we first evaluate whether language models accurately
incorporate feedback to produce refinements,
finding that only large language
models (175B parameters) do so. Using only
100 samples of human-written feedback, our
learning algorithm finetunes a GPT-3 model to
roughly human-level summarization ability.
more »
« less
Post Hoc Explanations of Language Models Can Improve Language Models
- Award ID(s):
- 2238714
- PAR ID:
- 10535813
- Publisher / Repository:
- Advances in Neural Information Processing Systems
- Date Published:
- Format(s):
- Medium: X
- Location:
- Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, lead to critical insights for refining in context learning.more » « less