In-Context Learning (ICL) ability has been found efficient across a wide range of applications, where the Large Language Models (LLM) learn to complete the tasks from the examples in the prompt without tuning the parameters. In this work, we conduct a comprehensive study to understand ICL from a statistical perspective. First, we show that the perfectly pretrained LLMs perform Bayesian Model Averaging (BMA) for ICL under a dynamic model of examples in the prompt. The average error analysis for ICL is then built for the perfectly pretrained LLMs with the analysis of BMA. Second, we demonstrate how the attention structure boosts the BMA implementation. With sufficient examples in the prompt, attention is proven to perform BMA under the Gaussian linear ICL model, which also motivates the explicit construction of the hidden concepts from the attention heads' values. Finally, we analyze the pretraining behavior of LLMs. The pretraining error is decomposed as the generalization error and the approximation error. The generalization error is upper bounded via the PAC-Bayes framework. Then the ICL average error of the pretrained LLMs is shown to be the sum of O(T^{-1}) and the pretraining error. In addition, we analyze the ICL performance of the pretrained LLMs with misspecified examples.
more »
« less
Black Box Variational Bayesian Model Averaging
For many decades now, Bayesian Model Averaging (BMA) has been a popular framework to systematically account for model uncertainty that arises in situations when multiple competing models are available to describe the same or similar physical process. The implementation of this framework, however, comes with a multitude of practical challenges including posterior approximation via Markov chain Monte Carlo and numerical integration. We present a Variational Bayesian Inference approach to BMA as a viable alternative to the standard solutions which avoids many of the aforementioned pitfalls. The proposed method is “black box” in the sense that it can be readily applied to many models with little to no model-specific derivation. We illustrate the utility of our variational approach on a suite of examples and discuss all the necessary implementation details. Fully documented Python code with all the examples is provided as well.
more »
« less
- PAR ID:
- 10338221
- Date Published:
- Journal Name:
- The American Statistician
- ISSN:
- 0003-1305
- Page Range / eLocation ID:
- 1 to 12
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The modeling of coupled fluid transport and deformation in a porous medium is essential to predict the various geomechanical process such as CO2 sequestration, hydraulic fracturing, and so on. Current applications of interest, for instance, that include fracturing or damage of the solid phase, require a nonlinear description of the large deformations that can occur. This paper presents a variational energy‐based continuum mechanics framework to model large‐deformation poroelasticity. The approach begins from the total free energy density that is additively composed of the free energy of the components. A variational procedure then provides the balance of momentum, fluid transport balance, and pressure relations. A numerical approach based on finite elements is applied to analyze the behavior of saturated and unsaturated porous media using a nonlinear constitutive model for the solid skeleton. Examples studied include the Terzaghi and Mandel problems; a gas–liquid phase‐changing fluid; multiple immiscible gases; and unsaturated systems where we model injection of fluid into soil. The proposed variational approach can potentially have advantages for numerical methods as well as for combining with data‐driven models in a Bayesian framework.more » « less
-
In this paper, we summarize some recent advances related to the energetic variational approach (EnVarA), a general variational framework of building thermodynamically consistent models for complex fluids, by some examples. Particular focus will be placed on how to model systems involving chemo-mechanical couplings and non-isothermal effects.more » « less
-
The BUQEYE collaboration (Bayesian Uncertainty Quantification: Errors in Your effective field theory) presents a pedagogical introduction to projection-based, reduced-order emulators for applications in low-energy nuclear physics. The term emulator refers here to a fast surrogate model capable of reliably approximating high-fidelity models. As the general tools employed by these emulators are not yet well-known in the nuclear physics community, we discuss variational and Galerkin projection methods, emphasize the benefits of offline-online decompositions, and explore how these concepts lead to emulators for bound and scattering systems that enable fast and accurate calculations using many different model parameter sets. We also point to future extensions and applications of these emulators for nuclear physics, guided by the mature field of model (order) reduction. All examples discussed here and more are available as interactive, open-source Python code so that practitioners can readily adapt projection-based emulators for their own work.more » « less
-
To guide the selection of probabilistic solar power forecasting methods for day-ahead power grid operations, the performance of four methods, i.e., Bayesian model averaging (BMA), Analog ensemble (AnEn), ensemble learning method (ELM), and persistence ensemble (PerEn) is compared in this paper. A real-world hourly solar generation dataset from a rooftop solar plant is used to train and validate the methods under clear, partially cloudy, and overcast weather conditions. Comparisons have been made on a one-year testing set using popular performance metrics for probabilistic forecasts. It is found that the ELM method outperforms other methods by offering better reliability, higher resolution, and narrower prediction interval width under all weather conditions with a slight compromise in accuracy. The BMA method performs well under overcast and partially cloudy weather conditions, although it is outperformed by the ELM method under clear conditions.more » « less
An official website of the United States government

