skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on September 18, 2026

Title: Geometric approaches to matrix normalization and graph balancing
Normal matrices, or matrices which commute with their adjoints, are of fundamental importance in pure and applied mathematics. In this paper, we study a natural functional on the space of square complex matrices whose global minimizers are normal matrices. We show that this functional, which we refer to as the non-normal energy, has incredibly well-behaved gradient descent dynamics: despite it being nonconvex, we show that the only critical points of the non-normal energy are the normal matrices, and that its gradient descent trajectories fix matrix spectra and preserve the subset of real matrices. We also show that, even when restricted to the subset of unit Frobenius norm matrices, the gradient flow of the non-normal energy retains many of these useful properties. This is applied to prove that low-dimensional homotopy groups of spaces of unit norm normal matrices vanish; for example, we show that the space of $$d \times d$$ complex unit norm normal matrices is simply connected for all $$d \geq 2$$. Finally, we consider the related problem of balancing a weighted directed graph – that is, readjusting its edge weights so that the weighted in-degree and out-degree are the same at each node. We adapt the non-normal energy to define another natural functional whose global minima are balanced graphs and show that gradient descent of this functional always converges to a balanced graph, while preserving graph spectra and realness of the weights. Our results were inspired by concepts from symplectic geometry and Geometric Invariant Theory, but we mostly avoid invoking this machinery and our proofs are generally self-contained.  more » « less
Award ID(s):
2107700 2107808 2324962
PAR ID:
10637323
Author(s) / Creator(s):
;
Publisher / Repository:
Cambridge University Press
Date Published:
Journal Name:
Forum of Mathematics, Sigma
Volume:
13
ISSN:
2050-5094
Page Range / eLocation ID:
e149
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The success of gradient descent in ML and especially for learning neural networks is remarkable and robust. In the context of how the brain learns, one aspect of gradient descent that appears biologically difficult to realize (if not implausible) is that its updates rely on feedback from later layers to earlier layers through the same connections. Such bidirected links are relatively few in brain networks, and even when reciprocal connections exist, they may not be equi-weighted. Random Feedback Alignment (Lillicrap et al., 2016), where the backward weights are random and fixed, has been proposed as a bio-plausible alternative and found to be effective empirically. We investigate how and when feedback alignment (FA) works, focusing on one of the most basic problems with layered structure n×m, the goal is to find a low rank factorization Zn×rWr×m that minimizes the error ∥ZW−Y∥F. Gradient descent solves this problem optimally. We show that FA finds the optimal solution when r≥rank(Y). We also shed light on how FA works. It is observed empirically that the forward weight matrices and (random) feedback matrices come closer during FA updates. Our analysis rigorously derives this phenomenon and shows how it facilitates convergence of FA*, a closely related variant of FA. We also show that FA can be far from optimal when r 
    more » « less
  2. The success of gradient descent in ML and especially for learning neural networks is remarkable and robust. In the context of how the brain learns, one aspect of gradient descent that appears biologically difficult to realize (if not implausible) is that its updates rely on feedback from later layers to earlier layers through the same connections. Such bidirected links are relatively few in brain networks, and even when reciprocal connections exist, they may not be equi-weighted. Random Feedback Alignment (Lillicrap et al., 2016), where the backward weights are random and fixed, has been proposed as a bio-plausible alternative and found to be effective empirically. We investigate how and when feedback alignment (FA) works, focusing on one of the most basic problems with layered structure n×m, the goal is to find a low rank factorization Zn×rWr×m that minimizes the error ∥ZW−Y∥F. Gradient descent solves this problem optimally. We show that FA finds the optimal solution when r≥rank(Y). We also shed light on how FA works. It is observed empirically that the forward weight matrices and (random) feedback matrices come closer during FA updates. Our analysis rigorously derives this phenomenon and shows how it facilitates convergence of FA*, a closely related variant of FA. We also show that FA can be far from optimal when r 
    more » « less
  3. This paper provides a framework to evaluate the performance of single and double integrator networks over arbitrary directed graphs. Adopting vehicular network terminology, we consider quadratic performance metrics defined by the L2-norm of position and velocity based response functions given impulsive inputs to each vehicle. We exploit the spectral properties of weighted graph Laplacians and output performance matrices to derive a novel method of computing the closed-form solutions for this general class of performance metrics, which include H2-norm based quantities as special cases. We then explore the effect of the interplay between network properties (such as edge directionality and connectivity) and the control strategy on the overall network performance. More precisely, for systems whose interconnection is described by graphs with normal Laplacian L, we characterize the role of directionality by comparing their performance with that of their undirected counterparts, represented by the Hermitian part of L. We show that, for single-integrator networks, directed and undirected graphs perform identically. However, for double-integrator networks, graph directionality -expressed by the eigenvalues of L with nonzero imaginary part- can significantly degrade performance. Interestingly, in many cases, well-designed feedback can also exploit directionality to mitigate degradation or even improve the performance to exceed that of the undirected case. Finally we focus on a system coherence metric -aggregate deviation from the state average- to investigate the relationship between performance and degree of connectivity, leading to somewhat surprising findings. For example, increasing the number of neighbors on a ω-nearest neighbor directed graph does not necessarily improve performance. Similarly, we demonstrate equivalence in performance between all-to-one and all-to-all communication graphs. 
    more » « less
  4. Spatially distributed networks of large size arise in a variety of science and engineering problems, such as wireless sensor networks and smart power grids. Most of their features can be described by properties of their state-space matrices whose entries have indices in the vertex set of a graph. In this paper, we introduce novel algebras of Beurling type that contain matrices on a connected simple graph having polynomial off-diagonal decay, and we show that they are Banach subalgebras of the space of all bounded operators on the space of all p-summable sequences. The $$\ell^p$$-stability of state-space matrices is an essential hypothesis for the robustness of spatially distributed networks. In this paper, we establish the equivalence among -stabilities of matrices in Beurling algebras for different exponents $$p$$, with quantitative analysis for the lower stability bounds. Admission of norm-control inversion plays a crucial role in some engineering practice. In this paper, we prove that matrices in Beurling subalgebras of have norm-controlled inversion and we find a norm-controlled polynomial with close to optimal degree. Polynomial estimate to powers of matrices is important for numerical implementation of spatially distributed networks. In this paper, we apply our results on norm-controlled inversion to obtain a polynomial estimate to powers of matrices in Beurling algebras. The polynomial estimate is a noncommutative extension about convolution powers of a complex function and is applicable to estimate the probability of hopping from one agent to another agent in a stationary Markov chain on a spatially distributed network. 
    more » « less
  5. Low-rank matrix recovery is a fundamental problem in machine learning with numerous applications. In practice, the problem can be solved by convex optimization namely nuclear norm minimization, or by non-convex optimization as it is well-known that for low-rank matrix problems like matrix sensing and matrix completion, all local optima of the natural non-convex objectives are also globally optimal under certain ideal assumptions. In this paper, we study new approaches for matrix sensing in a semi-random model where an adversary can add any number of arbitrary sensing matrices. More precisely, the problem is to recover a low-rank matrix $$X^\star$$ from linear measurements $$b_i = \langle A_i, X^\star \rangle$$, where an unknown subset of the sensing matrices satisfies the Restricted Isometry Property (RIP) and the rest of the $$A_i$$'s are chosen adversarially. It is known that in the semi-random model, existing non-convex objectives can have bad local optima. To fix this, we present a descent-style algorithm that provably recovers the ground-truth matrix $$X^\star$$. For the closely-related problem of semi-random matrix completion, prior work [CG18] showed that all bad local optima can be eliminated by reweighting the input data. However, the analogous approach for matrix sensing requires reweighting a set of matrices to satisfy RIP, which is a condition that is NP-hard to check. Instead, we build on the framework proposed in [KLL$^+$23] for semi-random sparse linear regression, where the algorithm in each iteration reweights the input based on the current solution, and then takes a weighted gradient step that is guaranteed to work well locally. Our analysis crucially exploits the connection between sparsity in vector problems and low-rankness in matrix problems, which may have other applications in obtaining robust algorithms for sparse and low-rank problems. 
    more » « less