Linearly Structured World Representations in Maze-Solving Transformers

Ivanitskiy, M; Spies, A; Räuker, T; Corlouer, G; Mathwin, C; Quirke, L; Rager, C; Shah, R; Valentine, D; Diniz_Behn, C; Inoue, K; Wu_Fung, S

Citation Details

The emergence of seemingly similar representations across tasks and neural architectures suggests that convergent properties may underlie sophisticated behavior. One form of representation that seems particularly fundamental to reasoning in many artificial (and perhaps natural) networks is the formation of world models, which decompose observed task structures into re-usable perceptual primitives and task-relevant relations. In this work, we show that auto-regressive transformers tasked with solving mazes learn to linearly represent the structure of mazes, and that the formation of these representations coincides with a sharp increase in generalization performance. Furthermore, we find preliminary evidence for Adjacency Heads which may play a role in computing valid paths through mazes. more »

Award ID(s):: 2110745

PAR ID:: 10636462

Author(s) / Creator(s):: Ivanitskiy, M; Spies, A; Räuker, T; Corlouer, G; Mathwin, C; Quirke, L; Rager, C; Shah, R; Valentine, D; Diniz_Behn, C; Inoue, K; Wu_Fung, S

Publisher / Repository:: Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, PMLR

Date Published:: 2024-03-28

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Workshop Report:
The DOI is not currently available.

More Like this