In this work, we propose to utilize discrete graph Ricci flow to alter network entropy through feedback control. Given such feedback input can “reverse” entropic changes, we adapt the moniker of Maxwell’s Demon to motivate our approach. In particular, it has been recently shown that Ricci curvature from geometry is intrinsically connected to Boltzmann entropy as well as functional robustness of networks or the ability to maintain functionality in the presence of random fluctuations. From this, the discrete Ricci flow provides a natural avenue to “rewire” a particular network’s underlying geometry to improve throughout and resilience. Due to the real-world setting for which one may be interested in imposing nonlinear constraints amongst particular agents to understand the network dynamic evolution, controlling discrete Ricci flow may be necessary (e.g., we may seek to understand the entropic dynamics and curvature “flow” between two networks as opposed to solely curvature shrinkage). In turn, this can be formulated as a natural control problem for which we employ feedback control towards discrete Ricci-based flow and show that under certain discretization, namely Ollivier-Ricci curvature, one can show stability via Lyapunov analysis. We conclude with preliminary results with remarks on potential applications that will be a subject of future work. 
                        more » 
                        « less   
                    This content will become publicly available on February 25, 2027
                            
                            On the Ricci curvature of attention maps and transformers training and robustness
                        
                    
    
            Transformer models have revolutionized machine learning, yet the underpinnings behind their success are only beginning to be understood. In this work, we analyze transformers through the geometry of attention maps, treating them as weighted graphs and focusing on Ricci curvature, a metric linked to spectral properties and system robustness. We prove that lower Ricci curvature, indicating lower system robustness, leads to faster convergence of gradient descent during training. We also show that a higher frequency of positive curvature values enhances robustness, revealing a trade-off between performance and robustness. Building on this, we propose a regularization method to adjust the curvature distribution and provide experimental results supporting our theoretical predictions while offering insights into ways to improve transformer training and robustness. The geometric perspective provided in our paper offers a versatile framework for both understanding and improving the behavior of transformers. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2031849
- PAR ID:
- 10627697
- Publisher / Repository:
- NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract We first provide a stochastic formula for the Carathéodory distance in terms of general Markovian couplings and prove a comparison result between the Carathéodory distance and the complete Kähler metric with a negative lower curvature bound using the Kendall–Cranston coupling. This probabilistic approach gives a version of the Schwarz lemma on complete noncompact Kähler manifolds with a further decomposition Ricci curvature into the orthogonal Ricci curvature and the holomorphic sectional curvature, which cannot be obtained by using Yau–Royden's Schwarz lemma. We also prove coupling estimates on quaternionic Kähler manifolds. As a by‐product, we obtain an improved gradient estimate of positive harmonic functions on Kähler manifolds and quaternionic Kähler manifolds under lower curvature bounds.more » « less
- 
            We study the “geometric Ricci curvature lower bound”, introduced previously by Junge, Li and LaRacuente, for a variety of examples including group von Neumann algebras, free orthogonal quantum groups [Formula: see text], [Formula: see text]-deformed Gaussian algebras and quantum tori. In particular, we show that Laplace operator on [Formula: see text] admits a factorization through the Laplace–Beltrami operator on the classical orthogonal group, which establishes the first connection between these two operators. Based on a non-negative curvature condition, we obtain the completely bounded version of the modified log-Sobolev inequalities for the corresponding quantum Markov semigroups on the examples mentioned above. We also prove that the “geometric Ricci curvature lower bound” is stable under tensor products and amalgamated free products. As an application, we obtain a sharp Ricci curvature lower bound for word-length semigroups on free group factors.more » « less
- 
            When training deep neural networks, a model's generalization error is often observed to follow a power scaling law dependent both on the model size and the data size. Perhaps the best known example of such scaling laws are for transformer-based large language models (**LLMs**), where networks with billions of parameters are trained on trillions of tokens of text. Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. Our theory predicts a power law between the generalization error and both the training data size and the network size for transformers, where the power depends on the intrinsic dimension d of the training data. Notably, the constructed model architecture is shallow, requiring only logarithmic depth in d. By leveraging low-dimensional data structures under a manifold hypothesis, we are able to explain transformer scaling laws in a way which respects the data geometry. Moreover, we test our theory with empirical observation by training LLMs on natural language datasets. We find the observed empirical scaling laws closely agree with our theoretical predictions. Taken together, these results rigorously show the intrinsic dimension of data to be a crucial quantity affecting transformer scaling laws in both theory and practice.more » « less
- 
            Abstract We analyze networks of functional correlations between brain regions to identify changes in their structure caused by Attention Deficit Hyperactivity Disorder (adhd). We express the task for finding changes as a network anomaly detection problem on temporal networks. We propose the use of a curvature measure based on the Forman–Ricci curvature, which expresses higher-order correlations among two connected nodes. Our theoretical result on comparing this Forman–Ricci curvature with another well-known notion of network curvature, namely the Ollivier–Ricci curvature, lends further justification to the assertions that these two notions of network curvatures are not well correlated and therefore one of these curvature measures cannot be used as an universal substitute for the other measure. Our experimental results indicate nine critical edges whose curvature differs dramatically in brains ofadhdpatients compared to healthy brains. The importance of these edges is supported by existing neuroscience evidence. We demonstrate that comparative analysis of curvature identifies changes that more traditional approaches, for example analysis of edge weights, would not be able to identify.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
