skip to main content


This content will become publicly available on October 21, 2025

Title: Dynamics of finite width Kernel and prediction fluctuations in mean field neural networks
Abstract

We analyze the dynamics of finite width effects in wide but finite feature learning neural networks. Starting from a dynamical mean field theory description of infinite width deep neural network kernel and prediction dynamics, we provide a characterization of theO(1/width)fluctuations of the dynamical mean field theory order parameters over random initializations of the network weights. Our results, while perturbative in width, unlike prior analyses, are non-perturbative in the strength of feature learning. We find that once the mean field/µP parameterization is adopted, the leading finite size effect on the dynamics is to introduce initialization variance in the predictions and feature kernels of the networks. In the lazy limit of network training, all kernels are random but static in time and the prediction variance has a universal form. However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently. In two layer networks, we show how feature learning can dynamically reduce the variance of the final tangent kernel and final network predictions. We also show how initialization variance can slow down online learning in wide but finite networks. In deeper networks, kernel variance can dramatically accumulate through subsequent layers at large feature learning strengths, but feature learning continues to improve the signal-to-noise ratio of the feature kernels. In discrete time, we demonstrate that large learning rate phenomena such as edge of stability effects can be well captured by infinite width dynamics and that initialization variance can decrease dynamically. For convolutional neural networks trained on CIFAR-10, we empirically find significant corrections to both the bias and variance of network dynamics due to finite width.

 
more » « less
Award ID(s):
2134157 2239780
PAR ID:
10561215
Author(s) / Creator(s):
;
Publisher / Repository:
IOP Publishing Ltd
Date Published:
Journal Name:
Journal of Statistical Mechanics: Theory and Experiment
Volume:
2024
Issue:
10
ISSN:
1742-5468
Page Range / eLocation ID:
104021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    A test of lepton flavor universality inB±K±μ+μandB±K±e+edecays, as well as a measurement of differential and integrated branching fractions of a nonresonantB±K±μ+μdecay are presented. The analysis is made possible by a dedicated data set of proton-proton collisions ats=13TeVrecorded in 2018, by the CMS experiment at the LHC, using a special high-rate data stream designed for collecting about 10 billion unbiased b hadron decays. The ratio of the branching fractionsB(B±K±μ+μ)toB(B±K±e+e)is determined from the measured double ratioR(K)of these decays to the respective branching fractions of theB±J/ψK±withJ/ψμ+μande+edecays, which allow for significant cancellation of systematic uncertainties. The ratioR(K)is measured in the range1.1<q2<6.0GeV2, whereqis the invariant mass of the lepton pair, and is found to beR(K)=0.780.23+0.47, in agreement with the standard model expectationR(K)1. This measurement is limited by the statistical precision of the electron channel. The integrated branching fraction in the sameq2range,B(B±K±μ+μ)=(12.42±0.68)×108, is consistent with the present world-average value and has a comparable precision.

     
    more » « less
  2. Abstract

    We consider a process of noncollidingq-exchangeable random walks onZmaking steps 0 (‘straight’) and −1 (‘down’). A single random walk is calledq-exchangeable if under an elementary transposition of the neighboring steps(down,straight)(straight,down)the probability of the trajectory is multiplied by a parameterq(0,1). Our process ofmnoncollidingq-exchangeable random walks is obtained from the independentq-exchangeable walks via the Doob’sh-transform for a nonnegative eigenfunctionh(expressed via theq-Vandermonde product) with the eigenvalue less than 1. The system ofmwalks evolves in the presence of an absorbing wall at 0. The repulsion mechanism is theq-analogue of the Coulomb repulsion of random matrix eigenvalues undergoing Dyson Brownian motion. However, in our model, the particles are confined to the positive half-line and do not spread as Brownian motions or simple random walks. We show that the trajectory of the noncollidingq-exchangeable walks started from an arbitrary initial configuration forms a determinantal point process, and express its kernel in a double contour integral form. This kernel is obtained as a limit from the correlation kernel ofq-distributed random lozenge tilings of sawtooth polygons. In the limit asm,q=eγ/mwithγ > 0 fixed, and under a suitable scaling of the initial data, we obtain a limit shape of our noncolliding walks and also show that their local statistics are governed by the incomplete beta kernel. The latter is a distinguished translation invariant ergodic extension of the two-dimensional discrete sine kernel.

     
    more » « less
  3. The first measurement of the cross section for incoherent photonuclear production ofJ/ψvector mesons as a function of the Mandelstam|t|variable is presented. The measurement was carried out with the ALICE detector at midrapidity,|y|<0.8, using ultraperipheral collisions of Pb nuclei at a center-of-mass energy per nucleon pair ofsNN=5.02TeV. This rapidity interval corresponds to a Bjorken-xrange(0.31.4)×103. Cross sections are given in five|t|intervals in the range0.04<|t|<1GeV2and compared to the predictions by different models. Models that ignore quantum fluctuations of the gluon density in the colliding hadron predict a|t|dependence of the cross section much steeper than in data. The inclusion of such fluctuations in the same models provides a better description of the data.

    © 2024 CERN, for the ALICE Collaboration2024CERN 
    more » « less
  4. Abstract

    A generic search is presented for the associated production of a Z boson or a photon with an additional unspecified massive particle X,$${\textrm{pp}}\rightarrow {\textrm{pp}} +{{\textrm{Z}}}/\upgamma +{{\textrm{X}}} $$pppp+Z/γ+X, in proton-tagged events from proton–proton collisions at$$\sqrt{s}=13\, \textrm{TeV}$$s=13TeV, recorded in 2017 with the CMS detector and the CMS-TOTEM precision proton spectrometer. The missing mass spectrum is analysed in the 600–1600 GeV range and a fit is performed to search for possible deviations from the background expectation. No significant excess in data with respect to the background predictions has been observed. Model-independent upper limits on the visible production cross section of$${\textrm{pp}}\rightarrow {\textrm{pp}} +{{\textrm{Z}}}/\upgamma +{{\textrm{X}}} $$pppp+Z/γ+Xare set.

     
    more » « less
  5. Properly interpreting lidar (light detection and ranging) signal for characterizing particle distribution relies on a key parameter,χ<#comment/>p(π<#comment/>), which relates the particulate volume scattering function (VSF) at 180° (β<#comment/>p(π<#comment/>)) that a lidar measures to the particulate backscattering coefficient (bbp). However,χ<#comment/>p(π<#comment/>)has been seldom studied due to challenges in accurately measuringβ<#comment/>p(π<#comment/>)andbbpconcurrently in the field. In this study,χ<#comment/>p(π<#comment/>), as well as its spectral dependence, was re-examined using the VSFs measuredin situat high angular resolution in a wide range of waters.β<#comment/>p(π<#comment/>), while not measured directly, was inferred using a physically sound, well-validated VSF-inversion method. The effects of particle shape and internal structure on the inversion were tested using three inversion kernels consisting of phase functions computed for particles that are assumed as homogenous sphere, homogenous asymmetric hexahedra, or coated sphere. The reconstructed VSFs using any of the three kernels agreed well with the measured VSFs with a mean percentage difference<<#comment/>5%<#comment/>at scattering angles<<#comment/>170∘<#comment/>. At angles immediately near or equal to 180°, the reconstructedβ<#comment/>p(π<#comment/>)depends strongly on the inversion kernel.χ<#comment/>p(π<#comment/>)derived with the sphere kernels was smaller than those derived with the hexahedra kernel but consistent withχ<#comment/>p(π<#comment/>)estimated directly from high-spectral-resolution lidar andin situbackscattering sensor. The possible explanation was that the sphere kernels are able to capture the backscattering enhancement feature near 180° that has been observed for marine particles.χ<#comment/>p(π<#comment/>)derived using the coated sphere kernel was generally lower than those derived with the homogenous sphere kernel. Our result suggests thatχ<#comment/>p(π<#comment/>)is sensitive to the shape and internal structure of particles and significant error could be induced if a fixed value ofχ<#comment/>p(π<#comment/>)is to be used to interpret lidar signal collected in different waters. On the other hand,χ<#comment/>p(π<#comment/>)showed little spectral dependence.

     
    more » « less