NSF PAR Search | NSF Public Access Repository

What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

Zhang, Yufeng; Zhang, Fengzhuo; Yang, Zhuoran; Wang, Zhaoran (May 2025, Journal of medicinal and chemical sciences)

In-Context Learning (ICL) ability has been found efficient across a wide range of applications, where the Large Language Models (LLM) learn to complete the tasks from the examples in the prompt without tuning the parameters. In this work, we conduct a comprehensive study to understand ICL from a statistical perspective. First, we show that the perfectly pretrained LLMs perform Bayesian Model Averaging (BMA) for ICL under a dynamic model of examples in the prompt. The average error analysis for ICL is then built for the perfectly pretrained LLMs with the analysis of BMA. Second, we demonstrate how the attention structure boosts the BMA implementation. With sufficient examples in the prompt, attention is proven to perform BMA under the Gaussian linear ICL model, which also motivates the explicit construction of the hidden concepts from the attention heads' values. Finally, we analyze the pretraining behavior of LLMs. The pretraining error is decomposed as the generalization error and the approximation error. The generalization error is upper bounded via the PAC-Bayes framework. Then the ICL average error of the pretrained LLMs is shown to be the sum of O(T^{-1}) and the pretraining error. In addition, we analyze the ICL performance of the pretrained LLMs with misspecified examples.

Free, publicly-accessible full text available May 6, 2026

In this work, we theoretically investigate why large language model (LLM)-empowered agents can solve decision-making problems in the physical world. We consider a hierarchical reinforcement learning (RL) model where the LLM Planner handles high-level task planning and the Actor performs low-level execution. Within this model, the LLM Planner operates in a partially observable Markov decision process (POMDP), iteratively generating language-based subgoals through prompting. Assuming appropriate pretraining data, we prove that the pretrained LLM Planner effectively conducts Bayesian aggregated imitation learning (BAIL) via in-context learning. We also demonstrate the need for exploration beyond the subgoals produced by BAIL, showing that naively executing these subgoals results in linear regret. To address this, we propose an ε-greedy exploration strategy for BAIL, which we prove achieves sublinear regret when pretraining error is low. Finally, we extend our theoretical framework to cases where the LLM Planner acts as a world model to infer the environment’s transition model and to multi-agent settings, facilitating coordination among multiple Actors.

Search for: All records