skip to main content


Search for: All records

Creators/Authors contains: "Gheissari, Reza"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    We study the scaling limits of stochastic gradient descent (SGD) with constant step‐size in the high‐dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite‐dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step‐size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step‐size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two‐layer networks for binary and XOR‐type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub‐optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.

     
    more » « less
  2. Abstract Consider $$(X_{i}(t))$$ ( X i ( t ) ) solving a system of N stochastic differential equations interacting through a random matrix $${\mathbf {J}} = (J_{ij})$$ J = ( J ij ) with independent (not necessarily identically distributed) random coefficients. We show that the trajectories of averaged observables of $$(X_i(t))$$ ( X i ( t ) ) , initialized from some $$\mu $$ μ independent of  $${\mathbf {J}}$$ J , are universal, i.e., only depend on the choice of the distribution $$\mathbf {J}$$ J through its first and second moments (assuming e.g., sub-exponential tails). We take a general combinatorial approach to proving universality for dynamical systems with random coefficients, combining a stochastic Taylor expansion with a moment matching-type argument. Concrete settings for which our results imply universality include aging in the spherical SK spin glass, and Langevin dynamics and gradient flows for symmetric and asymmetric Hopfield networks. 
    more » « less
  3. Abstract

    We establish rapid mixing of the random-cluster Glauber dynamics on random$$\varDelta $$Δ-regular graphs for all$$q\ge 1$$q1and$$pp<pu(q,Δ), where the threshold$$p_u(q,\varDelta )$$pu(q,Δ)corresponds to a uniqueness/non-uniqueness phase transition for the random-cluster model on the (infinite)$$\varDelta $$Δ-regular tree. It is expected that this threshold is sharp, and for$$q>2$$q>2the Glauber dynamics on random$$\varDelta $$Δ-regular graphs undergoes an exponential slowdown at$$p_u(q,\varDelta )$$pu(q,Δ). More precisely, we show that for every$$q\ge 1$$q1,$$\varDelta \ge 3$$Δ3, and$$pp<pu(q,Δ), with probability$$1-o(1)$$1-o(1)over the choice of a random$$\varDelta $$Δ-regular graph onnvertices, the Glauber dynamics for the random-cluster model has$$\varTheta (n \log n)$$Θ(nlogn)mixing time. As a corollary, we deduce fast mixing of the Swendsen–Wang dynamics for the Potts model on random$$\varDelta $$Δ-regular graphs for every$$q\ge 2$$q2, in the tree uniqueness region. Our proof relies on a sharp bound on the “shattering time”, i.e., the number of steps required to break up any configuration into$$O(\log n)$$O(logn)sized clusters. This is established by analyzing a delicate and novel iterative scheme to simultaneously reveal the underlying random graph with clusters of the Glauber dynamics configuration on it, at a given time.

     
    more » « less
  4. We study the Glauber dynamics for the random cluster (FK) model on the toruswith parameters (p,q), forq ∈ (1,4] andpthe critical point pc. The dynamics is believed to undergo acritical slowdown, with its continuous‐time mixing time transitioning fromforp ≠ pcto a power‐law innatp = pc. This was verified atp ≠ pcby Blanca and Sinclair, whereas at the criticalp = pc, with the exception of the special integer pointsq = 2,3,4 (where the model corresponds to the Ising/Potts models) the best‐known upper bound on mixing was exponential inn. Here we prove an upper bound ofatp = pcfor allq ∈ (1,4], where a key ingredient is bounding the number of nested long‐range crossings at criticality.

     
    more » « less