Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data

Havrilla, Alex; Liao, Wenjing

Citation Details

This content will become publicly available on January 31, 2026

Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data

When training deep neural networks, a model's generalization error is often observed to follow a power scaling law dependent both on the model size and the data size. Perhaps the best known example of such scaling laws are for transformer-based large language models (**LLMs**), where networks with billions of parameters are trained on trillions of tokens of text. Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. Our theory predicts a power law between the generalization error and both the training data size and the network size for transformers, where the power depends on the intrinsic dimension d of the training data. Notably, the constructed model architecture is shallow, requiring only logarithmic depth in d. By leveraging low-dimensional data structures under a manifold hypothesis, we are able to explain transformer scaling laws in a way which respects the data geometry. Moreover, we test our theory with empirical observation by training LLMs on natural language datasets. We find the observed empirical scaling laws closely agree with our theoretical predictions. Taken together, these results rigorously show the intrinsic dimension of data to be a crucial quantity affecting transformer scaling laws in both theory and practice. more »

Award ID(s):: 2145167

PAR ID:: 10574171

Author(s) / Creator(s):: Havrilla, Alex; Liao, Wenjing

Publisher / Repository:: Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Date Published:: 2025-01-31

Format(s):: Medium: X

Location:: Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on January 31, 2026
Conference Paper:
The DOI is not currently available.

More Like this