Looped Transformers for Length Generalization

Fan, Ying; Du, Yilun; Ramchandran, Kannan; Lee, Kangwook

Citation Details

This content will become publicly available on April 24, 2026

Looped Transformers for Length Generalization

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, they struggle with length generalization, i.e., handling inputs of unseen lengths. In this work, we demonstrate that looped Transformers with an adaptive number of steps significantly improve length generalization. We focus on tasks with a known iterative solution, involving multiple iterations of a RASP-L operation—a length-generalizable operation that can be expressed by a finite-sized Transformer. We train looped Transformers using our proposed learning algorithm and observe that they learn highly length-generalizable solutions for various tasks. more »

Award ID(s):: 2339978

PAR ID:: 10596476

Author(s) / Creator(s):: Fan, Ying; Du, Yilun; Ramchandran, Kannan; Lee, Kangwook

Publisher / Repository:: The Thirteenth International Conference on Learning Representations

Date Published:: 2025-04-24

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 24, 2026
Conference Paper:
The DOI is not currently available.

More Like this