How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks

Ramesh, Rahul; Mikail Khona; Robert P. Dick; Hidenori Tanaka; Ekdeep Singh Lubana.

Citation Details

Transformers trained on huge text corpora exhibit a remarkable set of capabilities. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. In this work, we train Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities and show that: (1) Transformers generalize to exponentially or even combinatorially many functions not seen in the training data; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions; (3) the training data has a significant impact on the model’s ability to compose functions (4) Attention layers in the latter half of the model seem critical to compositionality. more »

Award ID(s):: 2008151

PAR ID:: 10483595

Author(s) / Creator(s):: Ramesh, Rahul; Mikail Khona; Robert P. Dick; Hidenori Tanaka; Ekdeep Singh Lubana.

Publisher / Repository:: Proc. NeurIPS Wkshp. on Symmetry and Geometry in Neural Representations

Date Published:: 2023-11-29

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this