Task Descriptors Help Transformers Learn Linear Models In-Context

Huang, Ruomin; Ge, Rong

Citation Details

This content will become publicly available on April 25, 2026

Task Descriptors Help Transformers Learn Linear Models In-Context

Large language models (LLMs) exhibit strong in-context learning (ICL) ability, which allows the model to make predictions on new examples based on the given prompt. Recently, a line of research (Von Oswald et al., 2023; Aky¨urek et al., 2023; Ahn et al., 2023; Mahankali et al., 2023; Zhang et al., 2024) considered ICL for a simple linear regression setting and showed that the forward pass of Transformers is simulating some variants of gradient descent (GD) algorithms on the in-context examples. In practice, the input prompt usually contains a task descriptor in addition to in-context examples. We investigate how the task description helps ICL in the linear regression setting. Consider a simple setting where the task descriptor describes the mean of input in linear regression. Our results show that gradient flow converges to a global minimum for a linear Transformer. At the global minimum, the Transformer learns to use the task descriptor effectively to improve its performance. Empirically, we verify our results by showing that the weights converge to the predicted global minimum and Transformers indeed perform better with task descriptors. more »

Award ID(s):: 2031849

PAR ID:: 10627708

Author(s) / Creator(s):: Huang, Ruomin; Ge, Rong

Publisher / Repository:: ICLR 2025

Date Published:: 2025-04-25

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 25, 2026
Conference Paper:
The DOI is not currently available.

More Like this