We investigate the uniform reshuffling model for money exchanges: two agents picked uniformly at random redistribute their dollars between them. This stochastic dynamics is of meanfield type and eventually leads to a exponential distribution of wealth. To better understand this dynamics, we investigate its limit as the number of agents goes to infinity. We prove rigorously the socalled propagation of chaos which links the stochastic dynamics to a (limiting) nonlinear partial differential equation (PDE). This deterministic description, which is wellknown in the literature, has a flavor of the classical Boltzmann equation arising from statistical mechanics of dilute gases. We prove its convergence toward its exponential equilibrium distribution in the sense of relative entropy.
more »
« less
Global convergence of neuron birthdeath dynamics
Neural networks with a large number of units ad mit a meanfield description, which has recently served as a theoretical explanation for the favor able training properties of “overparameterized” models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appro priate assumptions. In this work, we propose a nonlocal mass transport dynamics that leads to a modified PDE with the same minimizer. We im plement this nonlocal dynamics as a stochastic neuronal birthdeath process and we prove that it accelerates the rate of convergence in the mean field limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the meanfield equation, each of which can easily be implemented for neural networks with finite numbers of units. We illustrate our algorithms with two models to provide intuition for the mech anism through which convergence is accelerated
more »
« less
 Award ID(s):
 1845360
 NSFPAR ID:
 10159675
 Date Published:
 Journal Name:
 International Conference on Machine Learning
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


The physicsinformed neural networks (PINNs) has been widely utilized to numerically approximate PDE problems. While PINNs has achieved good results in producing solutions for many partial differential equations, studies have shown that it does not perform well on phase field models. In this paper, we partially address this issue by introducing a modified physicsinformed neural networks. In particular, they are used to numerically approximate Allen CahnOhtaKawasaki (ACOK) equation with a volume constraint. Technically, the inverse of Laplacian in the ACOK model presents many challenges to the baseline PINNs. To take the zero mean condition of the inverse of Laplacian, as well as the volume constraint, into consideration, we also introduce a separate neural network, which takes the second set of sampling points in the approximation process. Numerical results are shown to demonstrate the effectiveness of the modified PINNs. An additional benefit of this research is that the modified PINNs can also be applied to learn more general nonlocal phasefield models, even with an unknown nonlocal kernel.more » « less

In this paper we prove that Local (S)GD (or FedAvg) can optimize deep neural networks with Rectified Linear Unit (ReLU) activation function in polynomial time. Despite the established convergence theory of Local SGD on optimizing general smooth functions in communicationefficient distributed optimization, its convergence on nonsmooth ReLU networks still eludes full theoretical understanding. The key property used in many Local SGD analysis on smooth function is gradient Lipschitzness, so that the gradient on local models will not drift far away from that on averaged model. However, this decent property does not hold in networks with nonsmooth ReLU activation function. We show that, even though ReLU network does not admit gradient Lipschitzness property, the difference between gradients on local models and average model will not change too much, under the dynamics of Local SGD. We validate our theoretical results via extensive experiments. This work is the first to show the convergence of Local SGD on nonsmooth functions, and will shed lights on the optimization theory of federated training of deep neural networks.more » « less

In this paper, we investigate how the selfsynchronization property of a swarm of Kuramoto oscillators can be controlled and exploited to achieve target densities and target phase coherence. In the limit of an infinite number of oscillators, the collective dynamics of the agents’ density is described by a meanfield model in the form of a nonlocal PDE, where the nonlocality arises from the synchronization mechanism. In this meanfield setting, we introduce two spacetime dependent control inputs to affect the density of the oscillators: an angular velocity field that corresponds to a state feedback law for individual agents, and a control parameter that modulates the strength of agent interactions over space and time, i.e., a multiplicative control with respect to the integral nonlocal term. We frame the density tracking problem as a PDEconstrained optimization problem. The controlled synchronization and phaselocking are measured with classical polar order metrics. After establishing the mass conservation property of the meanfield model and bounds on its nonlocal term, a system of firstorder necessary conditions for optimality is recovered using a Lagrangian method. The optimality system, comprising a nonlocal PDE for the state dynamics equation, the respective nonlocal adjoint dynamics, and the Euler equation, is solved iteratively following a standard OptimizethenDiscretize approach and an efficient numerical solver based on spectral methods. We demonstrate our approach for each of the two control inputs in simulation.more » « less

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest —nonlinear but regular networks— no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth2 neural networks trained by SGD in the meanfield regime. We consider functions on binary inputs that depend on a latent lowdimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly under stood how neural networks routinely tackle highdimensional datasets and adapt to latent low dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGDlearnability with O(d) sample complexity in a large ambient dimension d. Our main results characterize a hierarchical property —the mergedstaircase property— that is both necessary and nearly sufficient for learning in this setting. We further show that nonlinear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new “dimensionfree” dynamics approximation result that applies to functions defined on a latent space of lowdimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for nonalmost orthogonal functions.more » « less