On the Ricci curvature of attention maps and transformers training and robustness

Farzam, Amirhossein; Schlesinger, Oded; Susskind, Joshua M; DiMartino, Juan Matias; Sapiro, Guillermo

Citation Details

This content will become publicly available on February 25, 2027

On the Ricci curvature of attention maps and transformers training and robustness

Transformer models have revolutionized machine learning, yet the underpinnings behind their success are only beginning to be understood. In this work, we analyze transformers through the geometry of attention maps, treating them as weighted graphs and focusing on Ricci curvature, a metric linked to spectral properties and system robustness. We prove that lower Ricci curvature, indicating lower system robustness, leads to faster convergence of gradient descent during training. We also show that a higher frequency of positive curvature values enhances robustness, revealing a trade-off between performance and robustness. Building on this, we propose a regularization method to adjust the curvature distribution and provide experimental results supporting our theoretical predictions while offering insights into ways to improve transformer training and robustness. The geometric perspective provided in our paper offers a versatile framework for both understanding and improving the behavior of transformers. more »

Award ID(s):: 2031849

PAR ID:: 10627697

Author(s) / Creator(s):: Farzam, Amirhossein; Schlesinger, Oded; Susskind, Joshua M; DiMartino, Juan Matias; Sapiro, Guillermo

Publisher / Repository:: NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations

Date Published:: 2026-02-25

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on February 25, 2027
Workshop Report:
The DOI is not currently available.

More Like this