skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 13, 2026

Title: On the Query Complexity of Verifier-Assisted Language Generation
Recently, a plethora of works have proposed inference-time algorithms (e.g. best-of-n), which incorporate verifiers to assist the generation process. Their quality-efficiency trade-offs have been empirically benchmarked on a variety of constrained generation tasks, but the algorithmic design landscape is still largely poorly understood. In this paper, we develop a mathematical framework for reasoning about constrained generation using a pre-trained language model generator oracle and a process verifier--which can decide whether a prefix can be extended to a string which satisfies the constraints of choice. We show that even in very simple settings, access to a verifier can render an intractable problem (information-theoretically or computationally) to a tractable one. In fact, we show even simple algorithms, like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier. Empirically, we show that a natural modification of tokenwise rejection sampling, in which the sampler is allowed to "backtrack" (i.e., erase the final few generated tokens) has robust and substantive benefits over natural baselines (e.g. (blockwise) rejection sampling, nucleus sampling)--both in terms of computational efficiency, accuracy and diversity.  more » « less
Award ID(s):
2211907
PAR ID:
10643639
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
International Conference on Machine Learning (ICML), 2025
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems. 
    more » « less
  2. In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in order to allow one to make “on-the-fly” corrections to samples drawn from probability distributions. These algorithms then act as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks. As a first step, we show how to design sampling correctors using proper learning algorithms. We then focus on the question of whether algorithms for sampling correctors can be more efficient in terms of sample complexity than learning algorithms for the analogous families of distributions. When correcting monotonicity, we show that this is indeed the case when also granted query access to the cumulative distribution function. We also obtain sampling correctors for monotonicity even without this stronger type of access, provided that the distribution be originally very close to monotone (namely, at a distance $$O(1/\log^2 n)$$). In addition to that, we consider a restricted error model that aims at capturing “missing data” corruptions. In this model, we show that distributions that are close to monotone have sampling correctors that are significantly more efficient than achievable by the learning approach. We consider the question of whether an additional source of independent random bits is required by sampling correctors to implement the correction process. We show that for correcting close-to-uniform distributions and close-to-monotone distributions, no additional source of random bits is required, as the samples from the input source itself can be used to produce this randomness. 
    more » « less
  3. Virtual network services that span multiple data centers are important to support emerging data-intensive applications in fields such as bioinformatics and retail analytics. Successful virtual network service composition and maintenance requires flexible and scalable ‘constrained shortest path management’ both in the management plane for virtual network embedding (VNE) or network function virtualization service chaining (NFV-SC), as well as in the data plane for traffic engineering (TE). In this paper, we show analytically and empirically that leveraging constrained shortest paths within recent VNE, NFV-SC and TE algorithms can lead to network utilization gains (of up to 50%) and higher energy efficiency. The management of complex VNE, NFV-SC and TE algorithms can be, however, intractable for large scale substrate networks due to the NP-hardness of the constrained shortest path problem. To address such scalability challenges, we propose a novel, exact constrained shortest path algorithm viz.‘, Neighborhoods Method’ (NM). Our NM uses novel search space reduction techniques and has a theoretical quadratic speed-up making it practically faster (by an order of magnitude) than recent branch-and-bound exhaustive search solutions. Finally, we detail our NM-based SDN controller implementation in a real-world testbed to further validate practical NM benefits for virtual network services. 
    more » « less
  4. In the dynamic field of urban planning and the context of unprecedented natural events, such as hurricanes, the fast generation of accurate maps from satellite imagery is paramount. While several studies have utilized Generative Adversarial Networks (GANs) for map generation from satellite images, the present work introduces a new approach by integrating contrastive learning into the GAN framework for enhanced map synthesis. Our methodology distinctively employs positive sampling by aligning similar features (e.g., roads) in both satellite images and their corresponding map outputs, and contrasts this with negative samples for disparate elements. This approach effectively replaces the conventional cyclic process in GANs with a more streamlined, unidirectional procedure, leading to improvements in both the quality of the synthesized maps and computational efficiency. We show the effectiveness of our proposed model, offering an advancement in map generation for remote sensing applications. 
    more » « less
  5. This article considers a discrete optimization via simulation (DOvS) problem defined on a graph embedded in the high-dimensional integer grid. Several DOvS algorithms that model the responses at the solutions as a realization of a Gaussian Markov random field (GMRF) have been proposed exploiting its inferential power and computational benefits. However, the computational cost of inference increases exponentially in dimension. We propose the projected Gaussian Markov improvement algorithm (pGMIA), which projects the solution space onto a lower-dimensional space creating the region-layer graph to reduce the cost of inference. Each node on the region-layer graph can be mapped to a set of solutions projected to the node; these solutions form a lower-dimensional solution-layer graph. We define the response at each region-layer node to be the average of the responses within the corresponding solution-layer graph. From this relation, we derive the region-layer GMRF to model the region-layer responses. The pGMIA alternates between the two layers to make a sampling decision at each iteration. It first selects a region-layer node based on the lower-resolution inference provided by the region-layer GMRF, then makes a sampling decision among the solutions within the solution-layer graph of the node based on the higher-resolution inference from the solution-layer GMRF. To solve even higher-dimensional problems (e.g., 100 dimensions), we also propose the pGMIA+: a multi-layer extension of the pGMIA. We show that both pGMIA and pGMIA+ converge to the optimum almost surely asymptotically and empirically demonstrate their competitiveness against state-of-the-art high-dimensional Bayesian optimization algorithms. 
    more » « less