NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Looped Transformers for Length Generalization

Fan, Ying; Du, Yilun; Ramchandran, Kannan; Lee, Kangwook (April 2025, The Thirteenth International Conference on Learning Representations)

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, they struggle with length generalization, i.e., handling inputs of unseen lengths. In this work, we demonstrate that looped Transformers with an adaptive number of steps significantly improve length generalization. We focus on tasks with a known iterative solution, involving multiple iterations of a RASP-L operation—a length-generalizable operation that can be expressed by a finite-sized Transformer. We train looped Transformers using our proposed learning algorithm and observe that they learn highly length-generalizable solutions for various tasks.
more » « less
Free, publicly-accessible full text available April 24, 2026
SPEX: Scaling Feature Interaction Explanations for LLMs

Kong, Justin_Singh; Butler, Landon; Agarwal; Erginbas, Yigit_Efe; Pedarsani, Ramtin; Yu, Bin; Ramchandran, Kannan (May 2025, ICML 2025)

Large language models (LLMs) have revolution- ized machine learning due to their ability to cap- ture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths (≈20). We propose Spectral Ex- plainer (SPEX), a model-agnostic interaction attri- bution algorithm that efficiently scales to large input lengths (≈1000). SPEX exploits underlying nat- ural sparsity among interactions—common in real- world data—and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions. We perform exper- iments across three difficult long-context datasets that require LLMs to utilize interactions between inputs to complete the task. For large inputs, SPEX outperforms marginal attribution methods by up to 20% in terms of faithfully reconstructing LLM out- puts. Further, SPEX successfully identifies key fea- tures and interactions that strongly influence model output. For one of our datasets, HotpotQA, SPEX provides interactions that align with human annota- tions. Finally, we use our model-agnostic approach to generate explanations to demonstrate abstract rea- soning in closed-source LLMs (GPT-4o mini) and compositional reasoning in vision-language models.
more » « less
Free, publicly-accessible full text available May 1, 2026
Statistical complexity and optimal algorithms for nonlinear ridge bandits

https://doi.org/10.1214/24-AOS2395

Rajaraman, Nived; Han, Yanjun; Jiao, Jiantao; Ramchandran, Kannan (December 2024, The Annals of Statistics)

Free, publicly-accessible full text available December 1, 2025
Competing Bandits in Non-Stationary Matching Markets

https://doi.org/10.1109/TIT.2024.3352228

Ghosh, Avishek; Sankararaman, Abishek; Ramchandran, Kannan; Javidi, Tara; Mazumdar, Arya (April 2024, IEEE Transactions on Information Theory)

Full Text Available
The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning

Kang, Justin S; Pedarsani, Ramtin; Ramchandran, Kannan (February 2024, Transactions on machine learning research)

Full Text Available
Online pricing for multi-user multi-item markets

Erginbas, Yigit E; Courtade, Thomas; Ramchandran, Kannan; Phade, Soham (December 2023, Advances in neural information processing systems)

Full Text Available
Learning a 1-layer conditional generative model in total variation

Jalal, Ajil; Kang, Justin; Uppal, Ananya; Ramchandran, Kannan; Price, Eric (December 2023, Neurips 2023)

Full Text Available
MRI Reconstruction with Side Information using Diffusion Models

https://doi.org/10.1109/IEEECONF59524.2023.10476807

Levac, Brett; Jalal, Ajil; Ramchandran, Kannan; Tamir, Jonathan I (October 2023, IEEE)

Full Text Available
Interactive Learning with Pricing for Optimal and Stable Allocations in Markets

Erginbas, Yigit Efe; Phade, Soham; Ramchandran, Kannan (April 2023, Proceedings of the International Workshop on Artificial Intelligence and Statistics)

Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed framework enhances the quality of recommendations by exploring allocations that optimistically maximize the rewards. To minimize instability, a measure of users' incentives to deviate from recommended allocations, the algorithm prices the items based on a scheme derived from the Walrasian equilibria. Though it is known that these equilibria yield stable prices for markets with known user preferences, our approach accounts for the inherent uncertainty in the preferences and further ensures that the users accept their recommendations under offered prices. To the best of our knowledge, our approach is the first to integrate techniques from combinatorial bandits, optimal resource allocation, and collaborative filtering to obtain an algorithm that achieves sub-linear social welfare regret as well as sub-linear instability. Empirical studies on synthetic and real-world data also demonstrate the efficacy of our strategy compared to approaches that do not fully incorporate all these aspects.
more » « less
Full Text Available
Test Accuracy vs. Generalization Gap: Model Selection in NLP without Accessing Training or Testing Data

https://doi.org/10.1145/3580305.3599518

Yang, Yaoqing; Theisen, Ryan; Hodgkinson, Liam; Gonzalez, Joseph E; Ramchandran, Kannan; Martin, Charles H; Mahoney, Michael W (August 2023, ACM)

Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective generalization metrics that can guide this type of model selection. Effective metrics are typically expected to correlate strongly with test performance. In this paper, we expand on prior analyses by examining generalization-metric-based model selection with the following objectives: (i) focusing on natural language processing (NLP) tasks, as prior work primarily concentrates on computer vision (CV) tasks; (ii) considering metrics that directly predict test error instead of the generalization gap; (iii) exploring metrics that do not need access to data to compute. From these objectives, we are able to provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks, exhibiting stronger correlations than other, more popular metrics. To further examine these metrics, we extend prior formulations relying on power law (PL) spectral distributions to exponential (EXP) and exponentially-truncated power law (E-TPL) families.
more » « less
Full Text Available

« Prev Next »

Search for: All records