NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Pathfinding with Anchoring Tokens

Zhang, Huaqing; Liu, Bingbin; Kim, Juno; Risteski, Andrej (July 2025, ICML 2025 Workshop on Methods and Opportunities at Small Scale (MOSS))

Planning is a critical aspect of multi-step reasoning, yet it remains challenging for large language models (LLMs). In this work, we use pathfinding in graphs as a sandbox for understanding and improving the planning abilities of LLMs. Our results show that while conventional autoregressive training generalizes poorly, an anchoring strategy, whereby a model first predicts a small subset of intermediate nodes along the path, significantly improves the path finding performance. We confirm these gains on two families of graphs with markedly different structures and provide preliminary heuristics for selecting effective anchor nodes, offering guidance for more realistic settings.
more » « less
Free, publicly-accessible full text available July 13, 2026
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Zhai, Runtian; Liu, Bingbin; Risteski, Andrej; Kolter, Zico; Ravikumar, Pradeep (May 2024, International Conference on Learning Representations (ICLR), 2024)

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance.
more » « less
Full Text Available
Understanding Augmentation-Based Self-Supervised Representation Learning Via RKHS Approximation And Regression

Zhai, Runtian; Liu, Bingbin; Risteski, Andrej; Kolter, Zico; Ravikumar, Pradeep (May 2024, International Conference on Learning Representations (ICLR), 2024)

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance.
more » « less
Full Text Available
Masked Prediction: A Parameter Identifiability View

Liu, Bingbin; Hsu, Daniel J.; Ravikumar, Pradeep; Risteski, Andrej (January 2022, Advances in neural information processing systems)

The vast majority of work in self-supervised learning have focused on assessing recovered features by a chosen set of downstream tasks. While there are several commonly used benchmark datasets, this lens of feature learning requires assumptions on the downstream tasks which are not inherent to the data distribution itself. In this paper, we present an alternative lens, one of parameter identifiability: assuming data comes from a parametric probabilistic model, we train a self-supervised learning predictor with a suitable parametric form, and ask whether the parameters of the optimal predictor can be used to extract the parameters of the ground truth generative model. Specifically, we focus on latent-variable models capturing sequential structures, namely Hidden Markov Models with both discrete and conditionally Gaussian observations. We focus on masked prediction as the self-supervised learning task and study the optimal masked predictor. We show that parameter identifiability is governed by the task difficulty, which is determined by the choice of data model and the amount of tokens to predict. Technique-wise, we uncover close connections with the uniqueness of tensor rank decompositions, a widely used tool in studying identifiability through the lens of the method of moments.
more » « less
Full Text Available
Contrastive learning of strong-mixing continuous-time stochastic processes

Liu, Bingbin; Ravikumar, Pradeep; Risteski, Andrej (April 2021, International Conference on Artificial Intelligence and Statistics)
null (Ed.)
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. It has recently emerged as one of the leading learning paradigms in the absence of labels across many different domains (e.g. brain imaging, text, images). However, theoretical understanding of many aspects of training, both statistical and algorithmic, remain fairly elusive. In this work, we study the setting of time series—more precisely, when we get data from a strong mixing continuous-time stochastic process. We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case. Moreover, we give sample complexity bounds for solving this task and quantitatively characterize what the value of the contrastive loss implies for distributional closeness of the learned kernel. As a byproduct, we illuminate the appropriate settings for the contrastive distribution, as well as other hyper-parameters in this setup.
more » « less
Full Text Available

Search for: All records