NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Welleck, Sean; Bertsch, Amanda; Finlayson, Matthew; Schoelkopf, Hailey; Xie, Alex; Neubig, Graham; Kulikov, Ilia; Harchaoui, Zaid (November 2024, https://doi.org/10.48550/arXiv.2406.16838 Focus to learn more)

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.
more » « less
Free, publicly-accessible full text available November 20, 2025
Generating Sequences by Learning to Self-Correct

Welleck, Sean; Lu, Ximing; West, Peter; Brahman, Faeze; Shen, Tianxiao; Khashabi, Daniel; Choi, Yejin (July 2023, The Eleventh International Conference on Learning Representations)

Sequence generation applications require satisfying semantic constraints, such as ensuring that programs are correct, using certain keywords, or avoiding undesirable content. Language models, whether fine-tuned or prompted with few-shot demonstrations, frequently violate these constraints, and lack a mechanism to iteratively revise their outputs. Moreover, some powerful language models are of extreme scale or inaccessible, making it inefficient, if not infeasible, to update their parameters for task-specific adaptation. We present Self-Correction, an approach that decouples an imperfect base generator (an off-the-shelf language model or supervised sequence-to-sequence model) from a separate corrector that learns to iteratively correct imperfect generations. To train the corrector, we propose an online training procedure that can use either scalar or natural language feedback on intermediate imperfect generations. We show that Self-Correction improves upon the base generator in three diverse generation tasks - mathematical program synthesis, lexically-constrained generation, and toxicity control - even when the corrector is much smaller than the base generator.
more » « less
Full Text Available
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

https://doi.org/10.18653/v1/2022.emnlp-main.611

Liu, Jiacheng; Hallinan, Skyler; Lu, Ximing; He, Pengfei; Welleck, Sean; Hajishirzi, Hannaneh; Choi, Yejin (January 2022, EMNLP)

Full Text Available
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

Pillutla, Krishna; Swayamdipta, Swabha; Zellers, Rowan; Thickstun, John; Welleck, Sean; hoi, Yejin; Harchaoui, Zaid (January 2022, Advances in neural information processing systems)

As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.
more » « less
Full Text Available

Search for: All records