skip to main content


Title: High‐dimensional limit theorems for SGD: Effective dynamics and critical scaling
Abstract

We study the scaling limits of stochastic gradient descent (SGD) with constant step‐size in the high‐dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite‐dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step‐size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step‐size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two‐layer networks for binary and XOR‐type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub‐optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.

 
more » « less
NSF-PAR ID:
10486935
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Communications on Pure and Applied Mathematics
Volume:
77
Issue:
3
ISSN:
0010-3640
Format(s):
Medium: X Size: p. 2030-2080
Size(s):
["p. 2030-2080"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Large-eddy simulation (LES) is used to model turbulent winds in a nominally neutral atmospheric boundary layer at varying mesh resolutions. The boundary layer is driven by wind shear with zero surface heat flux and is capped by a stable inversion. Because of entrainment the boundary layer is in a weakly stably stratified regime. The simulations use meshes varying from 1282× 64 to 10242× 512 grid points in a fixed computational domain of size (2560, 2560, 896) m. The subgrid-scale (SGS) parameterizations used in the LES vary with the mesh spacing. Low-order statistics, spectra, and structure functions are compared on the different meshes and are used to assess grid convergence in the simulations. As expected, grid convergence is primarily achieved in the middle of the boundary layer where there is scale separation between the energy-containing and dissipative eddies. Near the surface second-order statistics do not converge on the meshes studied. The analysis also highlights differences between one-dimensional and two-dimensional velocity spectra; differences are attributed to sampling errors associated with aligning the horizontal coordinates with the vertically veering mean wind direction. Higher-order structure functions reveal non-Gaussian statistics on all scales, but are highly dependent on the mesh resolution. A generalized logarithmic law and a k−1spectral scaling regime are identified with mesh-dependent parameters in agreement with previously published results.

     
    more » « less
  2. Consider a system of homogeneous interacting diffusive particles labeled by the nodes of a unimodular Galton–Watson tree, where the state of each node evolves infinitesi- mally like a d-dimensional diffusion whose drift coefficient depends on (the histories of) its own state and the states of neighboring nodes, and whose diffusion coefficient depends only on (the history of) its own state. Under suitable regularity assumptions on the coefficients, an autonomous characterization is obtained for the marginal dis- tribution of the dynamics of the neighborhood of a typical node in terms of a certain local equation, which is a new kind of stochastic differential equation that is nonlinear in the sense of McKean. This equation describes a finite-dimensional non-Markovian stochastic process whose infinitesimal evolution at any time depends not only on the structure and current state of the neighborhood, but also on the conditional law of the current state given the past of the states of neighborhing nodes until that time. Such marginal distributions are of interest because they arise as weak limits of both marginal distributions and empirical measures of interacting diffusions on many sequences of sparse random graphs, including the configuration model and Erdös–Rényi graphs whose average degrees converge to a finite non-zero limit. The results obtained complement classical results in the mean-field regime, which characterize the limiting dynamics of homogeneous interacting diffusions on complete graphs, as the num- ber of nodes goes to infinity, in terms of a corresponding nonlinear Markov process. However, in the sparse graph setting, the topology of the graph strongly influences the dynamics, and the analysis requires a completely different approach. The proofs of existence and uniqueness of the local equation rely on delicate new conditional independence and symmetry properties of particle trajectories on unimodular Galton– Watson trees, as well as judicious use of changes of measure. 
    more » « less
  3. Abstract

    We investigate the dynamic behavior of lattices with disorder introduced through non-local network connections. Inspired by the Watts–Strogatz small-world model, we employ a single parameter to determine the probability of local connections being re-wired, and to induce transitions between regular and disordered lattices. These connections are added as non-local springs to underlying periodic one-dimensional (1D) and two-dimensional (2D) square, triangular and hexagonal lattices. Eigenmode computations illustrate the emergence of spectral gaps in various representative lattices for increasing degrees of disorder. These gaps manifest themselves as frequency ranges where the modal density goes to zero, or that are populated only by localized modes. In both cases, we observe low transmission levels of vibrations across the lattice. Overall, we find that these gaps are more pronounced for lattice topologies with lower connectivity, such as the 1D lattice or the 2D hexagonal lattice. We then illustrate that the disordered lattices undergo transitions from ballistic to super-diffusive or diffusive transport for increasing levels of disorder. These properties, illustrated through numerical simulations, unveil the potential for disorder in the form of non-local connections to enable additional functionalities for metamaterials. These include the occurrence of disorder-induced spectral gaps, which is relevant to frequency filtering devices, as well as the possibility to induce diffusive-type transport which does not occur in regular periodic materials, and that may find applications in dynamic stress mitigation.

     
    more » « less
  4. An outstanding problem in statistical mechanics is the determination of whether prescribed functional forms of the pair correlation function g2(r) [or equivalently, structure factor S(k)] at some number density ρ can be achieved by many-body systems in d-dimensional Euclidean space. The Zhang–Torquato conjecture states that any realizable set of pair statistics, whether from a nonequilibrium or equilibrium system, can be achieved by equilibrium systems involving up to two-body interactions. To further test this conjecture, we study the realizability problem of the nonequilibrium iso-g2 process, i.e., the determination of density-dependent effective potentials that yield equilibrium states in which g2 remains invariant for a positive range of densities. Using a precise inverse algorithm that determines effective potentials that match hypothesized functional forms of g2(r) for all r and S(k) for all k, we show that the unit-step function g2, which is the zero-density limit of the hard-sphere potential, is remarkably realizable up to the packing fraction ϕ = 0.49 for d = 1. For d = 2 and 3, it is realizable up to the maximum “terminal” packing fraction ϕc = 1/2d, at which the systems are hyperuniform, implying that the explicitly known necessary conditions for realizability are sufficient up through ϕc. For ϕ near but below ϕc, the large-r behaviors of the effective potentials are given exactly by the functional forms exp[ − κ(ϕ)r] for d = 1, r−1/2 exp[ − κ(ϕ)r] for d = 2, and r−1 exp[ − κ(ϕ)r] (Yukawa form) for d = 3, where κ−1(ϕ) is a screening length, and for ϕ = ϕc, the potentials at large r are given by the pure Coulomb forms in the respective dimensions as predicted by Torquato and Stillinger [Phys. Rev. E 68, 041113 (2003)]. We also find that the effective potential for the pair statistics of the 3D “ghost” random sequential addition at the maximum packing fraction ϕc = 1/8 is much shorter ranged than that for the 3D unit-step function g2 at ϕc; thus, it does not constrain the realizability of the unit-step function g2. Our inverse methodology yields effective potentials for realizable targets, and, as expected, it does not reach convergence for a target that is known to be non-realizable, despite the fact that it satisfies all known explicit necessary conditions. Our findings demonstrate that exploring the iso-g2 process via our inverse methodology is an effective and robust means to tackle the realizability problem and is expected to facilitate the design of novel nanoparticle systems with density-dependent effective potentials, including exotic hyperuniform states of matter.

     
    more » « less
  5. In this study, we conduct a parametric analysis to evaluate the sensitivities of wall-modeled large-eddy simulation (LES) with respect to subgrid-scale (SGS) models, mesh resolution, wall boundary conditions and mesh anisotropy. While such investigations have been conducted for attached/flat-plate flow configurations, systematic studies specifically targeting turbulent flows with separation are notably sparse. To bridge this gap, our study focuses on the flow over a two-dimensional Gaussian-shaped bump at a moderately high Reynolds number, which involves smooth-body separation of a turbulent boundary layer under pressure-gradient and surface- curvature effects. In the simulations, the no-slip condition at the wall is replaced by three different forms of boundary condition based on the thin boundary layer equations and the mean wall-shear stress from high-fidelity numerical simulation to avoid the additional complexity of modeling the wall-shear stress. Various statistics, including the mean separation bubble size, mean velocity profile, and dissipation from SGS model, are compared and analyzed. The results reveal that capturing the separation bubble strongly depends on the choice of SGS model. While simulations approach grid convergence with resolutions nearing those of wall-resolved LES meshes, above this limit, the LES predictions exhibit intricate sensitivities to mesh resolution. Furthermore, both wall boundary conditions and the anisotropy of mesh cells exert discernible impacts on the turbulent flow predictions, yet the magnitudes of these impacts vary based on the specific SGS model chosen for the simulation. 
    more » « less