NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Pathfinding with Anchoring Tokens

Zhang, Huaqing; Liu, Bingbin; Kim, Juno; Risteski, Andrej (July 2025, ICML 2025 Workshop on Methods and Opportunities at Small Scale (MOSS))

Planning is a critical aspect of multi-step reasoning, yet it remains challenging for large language models (LLMs). In this work, we use pathfinding in graphs as a sandbox for understanding and improving the planning abilities of LLMs. Our results show that while conventional autoregressive training generalizes poorly, an anchoring strategy, whereby a model first predicts a small subset of intermediate nodes along the path, significantly improves the path finding performance. We confirm these gains on two families of graphs with markedly different structures and provide preliminary heuristics for selecting effective anchor nodes, offering guidance for more realistic settings.
more » « less
Free, publicly-accessible full text available July 13, 2026
On the Query Complexity of Verifier-Assisted Language Generation

Botta, Edoardo; Li, Yuchen; Mehta, Aashay; Ash, Jordan; Zhang, Cyril; Risteski, Andrej (July 2025, International Conference on Machine Learning (ICML), 2025)

Recently, a plethora of works have proposed inference-time algorithms (e.g. best-of-n), which incorporate verifiers to assist the generation process. Their quality-efficiency trade-offs have been empirically benchmarked on a variety of constrained generation tasks, but the algorithmic design landscape is still largely poorly understood. In this paper, we develop a mathematical framework for reasoning about constrained generation using a pre-trained language model generator oracle and a process verifier--which can decide whether a prefix can be extended to a string which satisfies the constraints of choice. We show that even in very simple settings, access to a verifier can render an intractable problem (information-theoretically or computationally) to a tractable one. In fact, we show even simple algorithms, like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier. Empirically, we show that a natural modification of tokenwise rejection sampling, in which the sampler is allowed to "backtrack" (i.e., erase the final few generated tokens) has robust and substantive benefits over natural baselines (e.g. (blockwise) rejection sampling, nucleus sampling)--both in terms of computational efficiency, accuracy and diversity.
more » « less
Free, publicly-accessible full text available July 13, 2026
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Zhai, Runtian; Liu, Bingbin; Risteski, Andrej; Kolter, Zico; Ravikumar, Pradeep (May 2024, International Conference on Learning Representations (ICLR), 2024)

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance.
more » « less
Full Text Available
Understanding Augmentation-Based Self-Supervised Representation Learning Via RKHS Approximation And Regression

Zhai, Runtian; Liu, Bingbin; Risteski, Andrej; Kolter, Zico; Ravikumar, Pradeep (May 2024, International Conference on Learning Representations (ICLR), 2024)

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance.
more » « less
Full Text Available
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines

Li, Yuchen; Kirchmayer, Alexandre; Mehta, Aashay; Qin, Yilong; Dadachev, Boris; Papineni, Kishore; Kumar, Sanjiv; Risteski, Andrej (July 2024, International Conference on Machine Learning (ICML), 2024)

Autoregressive language models are the currently dominant paradigm for text generation, but they have some fundamental limitations that cannot be remedied by scale—for example inherently sequential and unidirectional generation. While alternate classes of models have been explored, we have limited mathematical understanding of their fundamental power and limitations. In this paper we focus on Generative Masked Language Models (GMLMs), a non-autoregressive paradigm in which we train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. These models empirically strike a promising speed-quality tradeoff as each step can be typically parallelized by decoding the entire sequence in parallel. We develop a mathematical framework for analyzing and improving such models which sheds light on questions of sample complexity and inference speed and quality. Empirically, we adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality compared with autoregressive models. We run careful ablation experiments to give recommendations on key design choices, and make fine-grained observations on the common error modes in connection with our theory. Our mathematical analyses and empirical observations characterize both potentials and limitations of this approach, and can be applied to future works on improving understanding and performance of GMLMs.
more » « less
Full Text Available
Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective

Marwah, Tanya; Lipton, Zachary C; Lu, Jianfeng; Risteski, Andrej (October 2023, Proceedings of the 40th International Conference on Machine Learning)

Full Text Available
Statistical Efficiency of Score Matching: The View from Isoperimetry

Koehler, Frederic; Heckett, Alexander; Risteski, Andrej (January 2023, International Conference on Learning Representations)

Full Text Available
Statistical Efficiency of Score Matching: The View from Isoperimetry

Koehler, Frederic; Heckett, Alexander; Risteski, Andrej (January 2023, ICLR)

Full Text Available
Pitfalls of Gaussians as a noise distribution in NCE

Lee, Holden; Pabbaraju, Chirag; Sevekari, Anish Prasad; Risteski, Andrej (January 2023, International Conference on Learning Representations)

Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an easy-to-sample noise distribution q, in a manner that avoids having to calculate a partition function. It is well-known that the choice of q can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for q is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss, even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of q will be problematic in practice, suggesting that more complex and tailored noise distributions are essential to the success of NCE.
more » « less
Full Text Available
Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods

Koehler, Frederic; Lee, Holden; Risteski, Andrej (July 2022, Proceedings of Machine Learning Research)

Full Text Available

« Prev Next »

Search for: All records